For the following code:
<a class="title" href="the link">
Low price
<strong>computer</strong>
you should not miss
</a>
I used this xpath code to scrapy:
response.xpath('.//a[@class="title"]//text()[normalize-space()]').extract()
I got the following result:
u'\n \n Low price ', u'computer', u' you should not miss'
Why two \n and many empty spaces before low price was not removed by normalize-space() for this example?
Another question: how to combine the 3 parts as one scraped item as u'Low price computer you should not miss'?
If an element has spaces in its text or in the value of any attribute, then to create an xpath for such an element we have to use the normalize-space function. It removes all the trailing and leading spaces from the string. It also removes every new tab or lines existing within the string.
Please try this:
'normalize-space(.//a[@class="title"])'
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With