Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to set default cookies for SitemapSpider?

Tags:

scrapy

I am trying to set my own headers and cookies when crawling using SitemapSpider:

class MySpider(SitemapSpider):
    name = 'myspider'
    sitemap_urls = ['https://www.sitemap-1.xml']
    headers = {'pragma': 'no-cache',}
    cookies = {"sdsd": "23234",}

    def _request_sitemaps(self, response):
        for url in self.sitemap_urls:
            yield scrapy.Request(url=url,headers=self.headers,cookies=self.cookies,callback=self._parse_sitemap)

    def parse(self, response, **cb_kwargs):
        print(response.css('title::text').get())

... but it doesn't work (cookies and headers are not passed), how can I implement it?

like image 911
m_sasha Avatar asked Sep 02 '25 17:09

m_sasha


1 Answers

my decision

class MySpider(SitemapSpider):
    name = 'spider'
    sitemap_urls = ['https://www.sitemap-1.xml']
    headers = {'authority': 'www.example.com',}
    cookies = {"dsd": "jdjsj233",}

    def start_requests(self):
        for url in self.sitemap_urls:
            yield Request(url, self._parse_sitemap)

    def _parse_sitemap(self, response):
        response = response.replace(body=self._get_sitemap_body(response))
        for request in super()._parse_sitemap(response):
            url = request.url
            endpoint_request = request.replace(
                url=url,
                callback=self.parse,
                headers=self.headers,
                cookies=self.cookies,
            )
            yield endpoint_request

    def parse(self, response, **cb_kwargs):
        print(response.css('title::text').get())
like image 188
m_sasha Avatar answered Sep 05 '25 15:09

m_sasha



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!