Python PDF to Excel Conversion Guide

2 years ago

Benjamin Taylor

1 minute

To read a PDF file and write its content into an Excel file, you can use the PyPDF2 library to read the contents of the PDF file, and then use the openpyxl library to create and write the Excel file. Here is an example code to read a PDF file and write its content into an Excel file:

import PyPDF2
from openpyxl import Workbook

# 读取PDF文件
pdf_file = open('example.pdf', 'rb')
pdf_reader = PyPDF2.PdfFileReader(pdf_file)

# 创建Excel文件
wb = Workbook()
ws = wb.active

# 写入PDF内容到Excel文件
for page_num in range(pdf_reader.numPages):
    page = pdf_reader.getPage(page_num)
    text = page.extract_text()
    lines = text.split('\n')
    for row_num, line in enumerate(lines, start=1):
        ws.cell(row=row_num, column=1, value=line)

# 保存Excel文件
wb.save('output.xlsx')

# 关闭文件
pdf_file.close()

Please note that this is just a simple example code, adjustments may be needed according to the structure and content of the PDF file. Hope this helps you!

#openpyxl guide #PDF data extraction #PyPDF2 tutorial #Python file conversion #Python PDF to Excel