What is the basic syntax of XPath in Python?

2 years ago

Ava Mitchell

1 minute

The basic syntax of using XPath in Python is as follows:

Import the relevant modules.

from lxml import etree

Create an Element object.

html = etree.HTML(text)

In this case, the text is HTML text or file.

Extract data using XPath expressions:

Selecting elements: using tag names, attribute names, or attribute values to choose elements.

elements = html.xpath('//tag')  # 根据标签名选取元素
elements = html.xpath('//*[@attribute]')  # 根据属性名选取元素
elements = html.xpath('//tag[@attribute="value"]')  # 根据属性值选取元素

In this case, ‘tag’ refers to the name of the tag, ‘attribute’ refers to the name of the attribute, and ‘value’ refers to the value of the attribute.

Extract text:

text = element.text  # 提取单个元素的文本
texts = [element.text for element in elements]  # 提取多个元素的文本

Extracting attribute values:

attribute = element.get('attribute')  # 提取单个元素的属性值
attributes = [element.get('attribute') for element in elements]  # 提取多个元素的属性值

Other common XPath syntax:

Selecting elements using an index:

element = elements[index]  # 根据索引选取元素，索引从0开始

Select elements using wildcards.

elements = html.xpath('//*')  # 选取所有元素

Select elements using logical operators.

elements = html.xpath('//tag1 | //tag2')  # 选取多个标签的元素

The above is the basic syntax of XPath, but there are more advanced syntax and methods that can be learned and used according to actual needs in practice.