Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

getting both "a" text and regular text on scrapy

Tags:

python

scrapy

I have the following span:

<span class="name">

    bla bla <a href="address">foo</a> bar
</span>

I want scrapy to extract the entire sentence without the link, meaining:
bla bla foo bar

How do I do that?

like image 463
Boaz Avatar asked Jan 29 '26 19:01

Boaz


1 Answers

You can use descendant-or-self::*/text() xpath expression:

//span[@class="name"]/descendant-or-self::*/text()

Demo (using scrapy shell):

$ cat index.html 
<span class="name">bla bla <a href="address">foo</a> bar</span>
$ scrapy shell index.html
>>> results = sel.xpath('//span[@class="name"]/descendant-or-self::*/text()').extract()
>>> ''.join(results)
u'bla bla foo bar'
like image 183
alecxe Avatar answered Jan 31 '26 08:01

alecxe



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!