Python Read Word Document: Extract Text Fast
Python can utilize the python-docx library to extract text content from Word documents. Here is an example code:
from docx import Document
# 打开Word文档
doc = Document('example.docx')
# 遍历文档的段落并输出文本内容
for para in doc.paragraphs:
print(para.text)
# 遍历文档的表格并输出单元格内容
for table in doc.tables:
for row in table.rows:
for cell in row.cells:
print(cell.text)
In this example, we start by importing the Document class. Next, we create a Document object by passing the file path of a Word document. We can then use the paragraphs property to iterate through the document’s paragraphs and output the text content of each paragraph. Similarly, we can use the tables property to iterate through the document’s tables and output the content of each cell.