How to use Python3 to convert multiple DOCX documents to TXT files

To batch convert DOCX documents to TXT using Python3, you can utilize the python-docx library. Below is a simple sample code:

from docx import Document

def convert_docx_to_txt(docx_file, txt_file):
    doc = Document(docx_file)
    with open(txt_file, 'w', encoding='utf-8') as f:
        for paragraph in doc.paragraphs:
            f.write(paragraph.text + '\n')

# 批量转换
docx_files = ['file1.docx', 'file2.docx', 'file3.docx']
for docx_file in docx_files:
    # 构建输出文件名
    txt_file = docx_file.replace('.docx', '.txt')
    convert_docx_to_txt(docx_file, txt_file)

The function convert_docx_to_txt in the above code takes a DOCX file path and a TXT file path as input, and writes the content of the DOCX document line by line into the TXT file. Then, the names of the DOCX files that need to be converted are stored in the docx_files list. Iterating through each file in the list, the convert_docx_to_txt function is called to perform the conversion.

Please note that the code utilizes the python-docx library, so you will need to install it beforehand. You can install it using the following command:

pip install python-docx

Please make sure you have installed Python 3 and pip, and have placed the DOCX file in the same directory as the code file.

bannerAds