Total newbie, trying to get scrapy to read a list of urls from csv and return the items in a csv. Need some help to figure out where I'm going wrong here: Spider code:
from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector
import random
class incyspider(BaseSpider):
name = "incyspider"
def __init__(self):
super(incyspider, self).__init__()
domain_name = "incyspider.co.uk"
f = open("urls.csv")
start_urls = [url.strip() for url in f.readlines()]
f.close
def parse(self, response):
hxs = HtmlXPathSelector(response)
sites = hxs.select('//div[@class="Product"]')
items = []
for site in sites:
item['title'] = hxs.select('//div[@class="Name"]/node()').extract()
item['hlink'] = hxs.select('//div[@class="Price"]/node()').extract()
item['price'] = hxs.select('//div[@class="Codes"]/node()').extract()
items.append(item)
return items
SPIDER = incyspider()
Here's the items.py code:
from scrapy.item import Item, Field
class incyspider(Item):
# define the fields for your item here like:
# name = Field()
title = Field()
hlink = Field()
price = Field()
pass
To run, I'm using
scrapy crawl incyspider -o items.csv -t csv
I would seriously appreciate any pointers.
I'm not exactly sure but after a quick look at your code I would say that at least you need to replace this line
sites = hxs.select('//div[@class="Product"]')
by this line
sites = hxs.select('//div[@class="Product"]').extract()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With