How to use XPath to extract li tags in Python?

2 years ago

Liam

1 minute

In Python, you can use XPath to retrieve the li tags from an HTML or XML document. To start, you will need to import the necessary libraries, such as the lxml library.

Here is an example code using XPath to extract li tags:

from lxml import etree

# HTML或XML文档的示例字符串
html = """
<html>
  <body>
    <ul>
      <li>Item 1</li>
      <li>Item 2</li>
      <li>Item 3</li>
    </ul>
  </body>
</html>
"""

# 创建一个XPath解析器
parser = etree.HTMLParser()

# 将字符串解析为一个XPath可解析的对象
tree = etree.fromstring(html, parser)

# 使用XPath表达式获取li标签
li_tags = tree.xpath('//li')

# 遍历获取到的li标签
for li in li_tags:
    print(li.text)

Running the above code will result in:

Item 1
Item 2
Item 3

In the XPath expression ‘//li’, // selects all li tags in the document, while li selects li tags specifically. Therefore, tree.xpath(‘//li’) will return a list containing all li tags. You can then iterate through this list to retrieve the content of each li tag.