I'm using a loop to generate my requests inside start_request() and I'd like to pass the index to parse() so it can store it in the item. However when I use self.i the output has the i max value (last loop turn) for every items. I can use response.url.re('regex to extract the index') but I wonder if there is a clean way to pass a variable from start_requests to parse.
You can use scrapy.Request meta attribute:
import scrapy class MySpider(scrapy.Spider): name = 'myspider' def start_requests(self): urls = [...] for index, url in enumerate(urls): yield scrapy.Request(url, meta={'index':index}) def parse(self, response): print(response.url) print(response.meta['index'])
You can pass cb_kwargs argument to scrapy.Request()
https://docs.scrapy.org/en/latest/topics/request-response.html#scrapy.http.Request.cb_kwargs
import scrapy class MySpider(scrapy.Spider): name = 'myspider' def start_requests(self): urls = [...] for index, url in enumerate(urls): yield scrapy.Request(url, callback=self.parse, cb_kwargs={'index':index}) def parse(self, response, index): pass
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With