How to use XPath to parse data in Python?

2 years ago

Noah Thompson

2 minutes

In Python, you can use the etree module from the lxml library to parse data. Here is a simple example of parsing data using XPath.

from lxml import etree

# 创建一个XML文档并解析
xml_data = """
<books>
    <book>
        <title>Python Cookbook</title>
        <author>David Beazley</author>
        <year>2013</year>
    </book>
    <book>
        <title>Fluent Python</title>
        <author>Luciano Ramalho</author>
        <year>2015</year>
    </book>
</books>
"""
root = etree.fromstring(xml_data)

# 使用XPath选择元素
titles = root.xpath("//title/text()")
authors = root.xpath("//author/text()")
years = root.xpath("//year/text()")

# 打印解析结果
for title, author, year in zip(titles, authors, years):
    print(f"Title: {title}")
    print(f"Author: {author}")
    print(f"Year: {year}")
    print("---")

The output is as follows:

Title: Python Cookbook
Author: David Beazley
Year: 2013
---
Title: Fluent Python
Author: Luciano Ramalho
Year: 2015
---

In the example above, the XML string is parsed into an Element object using the etree.fromstring() method. Then, the xpath() method is used to select the corresponding elements using XPath expressions. Finally, the text property is used to retrieve the text content of the elements.