How to run multiple spiders in the same process in Scrapy

Question

I'm beginner in Python & Scrapy. I've just create a Scrapy project with multiple spiders, when running "scrapy crawl .." it runs only the first spider.

How can I run all spiders in the same process?

Thanks in advance.

Abhishek · Accepted Answer

You will have a name for every spider in the file that says name="youspidername". and when you call it using scrapy crawl yourspidername, it will crawl only that spider. you will have to again give a command to run the other spider using scrapy crawl youotherspidername.

The other way is to just mention all the spiders in the same command like scrapy crawl yourspidername,yourotherspidername,etc.. (this method is not supported for the newer versions of scrapy)

Darian Moody · Answer

Everyone, even the docs, suggest using the internal API to author a "run script" which controls the start and stop of multiple spiders. However, this comes with a lot of caveats unless you get it absolutely correct (feedexports not working, the twisted reactor either not stopping or stopping too soon etc).

In my opinion, we have a known working and supported scrapy crawl x command and therefore a much easier way to handle this is to use GNU Parallel to parellize.

After install, to run (from the shell) one scrapy spider per core and assuming you wish to run all the ones in your project:

scrapy list | parallel --line-buffer scrapy crawl

If you only have one core, you can play around with the --jobs argument to GNU Parallel. For example, the following will run 2 scrapy jobs per core:

scrapy list | parallel --jobs 200% --line-buffer scrapy crawl

How to run multiple spiders in the same process in Scrapy

Tags:

python

python-2.7

scrapy

elhoucine

2 Answers

Abhishek

Darian Moody

Recent Activity

Donate For Us

How to run multiple spiders in the same process in Scrapy

Tags:

python

python-2.7

scrapy

elhoucine

2 Answers

Abhishek

Darian Moody

Related questions

Recent Activity

Donate For Us