i am running a spider that is pulling information like prices and shipping ... I am getting the shipping information back like this "Shipping:$.99,Shipping:,Shipping:,Shipping:$.49" .... the code that is extracting it looks like this
item["shipping"] = vendor.xpath("normalize-space(.//span[@class='shippingAmount']/text())").extract()
can i write this line to pull just the price after the "Shipping:" ?
Use a combination of substring-after and substring-before, ie.
substring-before(
substring-after(
"Shipping:$.99,Shipping:,Shipping:,Shipping:$.49",
"Shipping:"),
","
)
In XPath 1.0, there is no way to fetch all shipping amounts for an arbitrary number of shipping fees. You could query the 2nd, 3td, ... value by repeatedly calling substring-after($string, "Shipping:") to remove the former value.
(Linebreaks can be omitted, of course.)
You can extract the prices using some regular expression :
import re
str = "Shipping:$.99,Shipping:,Shipping:,Shipping:$.49"
re.findall(r'[\d+[.]]?\d+', str)
['.99', '.49']
To have 0 if there is no shipping:
[float(x) if x else 0 for x in re.sub('Shipping:[$]?','',str).split(',')]
[0.99, 0, 0, 0.49]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With