I am trying to use the link parsing structure described by "warwaruk" in this SO thread: Following links, Scrapy web crawler framework
This works great when only grabbing a single item from each page. However, when I try to create a for loop to scrape all items within each page, it appears that the parse_item function terminates upon reaching the first yield statement. I have a custom pipeline setup to handle each item, but currently it only receives one item per page.
Let me know if I need to include more code, or clarification. THANKS!
def parse_item(self,response):
hxs = HtmlXPathSelector(response)
prices = hxs.select("//div[contains(@class, 'item')]/script/text()").extract()
for prices in prices:
item = WalmartSampleItem()
...
yield items
You should yield a single item in the for loop, not items:
for prices in prices:
item = WalmartSampleItem()
...
yield item
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With