How to use regular expression in lxml xpath?

Question

I'm using construction like this:

doc = parse(url).getroot()
links = doc.xpath("//a[text()='some text']")

But I need to select all links which have text beginning with "some text", so I'm wondering is there any way to use regexp here? Didn't find anything in lxml documentation

Steven · Accepted Answer

You can do this (although you don't need regular expressions for the example). Lxml supports regular expressions from the EXSLT extension functions. (see the lxml docs for the XPath class, but it also works for the xpath() method)

doc.xpath("//a[re:match(text(), 'some text')]", 
        namespaces={"re": "http://exslt.org/regular-expressions"})

Note that you need to give the namespace mapping, so that it knows what the "re" prefix in the xpath expression stands for.

John Kugelman · Answer

You can use the starts-with() function:

doc.xpath("//a[starts-with(text(),'some text')]")

How to use regular expression in lxml xpath?

Tags:

python

regex

xpath

lxml

Arty

2 Answers

Steven

John Kugelman

Recent Activity

Donate For Us

How to use regular expression in lxml xpath?

Tags:

python

regex

xpath

lxml

Arty

2 Answers

Steven

John Kugelman

Related questions

Recent Activity

Donate For Us