Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Scrapy - how to convert string into an object which I can use XPath on?

Tags:

xpath

scrapy

Let's say I have some plain text in HTML-like format like this:

<div id="foo"><p id="bar">Some random text</p></div>

And I need to be able to run XPath on it to retrieve some inner element. How can I convert plain text to some kind of object which I could use XPath on?

like image 821
FTM Avatar asked Oct 17 '25 06:10

FTM


2 Answers

You can just use a normal selector on which to run the same xpath, css queries directly:

from scrapy import Selector

...

sel = Selector(text="<div id="foo"><p id="bar">Some random text</p></div>")
selected_xpath = sel.xpath('//div[@id="foo"]')
like image 68
eLRuLL Avatar answered Oct 21 '25 01:10

eLRuLL


You can pass HTML code sample as string to lxml.html and parse it with XPath:

from lxml import html

code = """<div id="foo"><p id="bar">Some random text</p></div>"""
source = html.fromstring(code)
source.xpath('//div/p/text()')
like image 31
Andersson Avatar answered Oct 21 '25 03:10

Andersson