Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

scrapy: newbie attempting to debug code

Tags:

python

scrapy

Total newbie, trying to get scrapy to read a list of urls from csv and return the items in a csv. Need some help to figure out where I'm going wrong here: Spider code:

from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector
import random

class incyspider(BaseSpider):
    name = "incyspider"
    def __init__(self):
        super(incyspider, self).__init__()
        domain_name = "incyspider.co.uk"
        f = open("urls.csv")
        start_urls = [url.strip() for url in f.readlines()]
        f.close

    def parse(self, response):
        hxs = HtmlXPathSelector(response)
        sites = hxs.select('//div[@class="Product"]')
        items = []
        for site in sites:
            item['title'] = hxs.select('//div[@class="Name"]/node()').extract()
            item['hlink'] = hxs.select('//div[@class="Price"]/node()').extract()
            item['price'] = hxs.select('//div[@class="Codes"]/node()').extract()
            items.append(item)

        return items

SPIDER = incyspider()

Here's the items.py code:

from scrapy.item import Item, Field

class incyspider(Item):
    # define the fields for your item here like:
    # name = Field()
    title = Field()
    hlink = Field()
    price = Field()
    pass

To run, I'm using

scrapy crawl incyspider -o items.csv -t csv

I would seriously appreciate any pointers.

like image 895
Mark P Avatar asked Dec 02 '25 09:12

Mark P


1 Answers

I'm not exactly sure but after a quick look at your code I would say that at least you need to replace this line

sites = hxs.select('//div[@class="Product"]')

by this line

sites = hxs.select('//div[@class="Product"]').extract() 
like image 95
pemistahl Avatar answered Dec 05 '25 00:12

pemistahl



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!