How to crawl multiple websites in different timing in scrapy

Question

I have multiple websites stored in database with different crawl time like every 5/10 minutes for every websites. I have created spider to crawl and running with cron. It will take all the websites from database and run crawling parallely for all websites. How can I implement to crawl each websites with different timing which is stored in the database? Is there any way to handle this in scrapy?

Thomas Strub · Accepted Answer

Have you tried playing around with adding a scheduling component in start_requests?

def start_requests(self):
    while:
        for spid_url in url_db['to_crawl'].find(typ='due'):
            // update url to crawltime
            yield scrapy.Request(...)

        // sleep until next_url_is_due
        // set_crawl_to_due    
        if enough:
            break

How to crawl multiple websites in different timing in scrapy

Tags:

python

python-3.x

scrapy

bhattraideb

1 Answers

Thomas Strub

Recent Activity

Donate For Us

How to crawl multiple websites in different timing in scrapy

Tags:

python

python-3.x

scrapy

bhattraideb

1 Answers

Thomas Strub

Related questions

Recent Activity

Donate For Us