Scrapy raises ReactorNotRestartable when CrawlerProcess is ran twice

Question

I have some code which looks something like this:

def run(spider_name, settings):
    runner = CrawlerProcess(settings)
    runner.crawl(spider_name)
    runner.start()
    return True

I have two py.test tests which each call run(), when the second test executes I get the following error.

    runner.start()
../../.virtualenvs/scrape-service/lib/python3.6/site-packages/scrapy/crawler.py:291: in start
    reactor.run(installSignalHandlers=False)  # blocking call
../../.virtualenvs/scrape-service/lib/python3.6/site-packages/twisted/internet/base.py:1242: in run
    self.startRunning(installSignalHandlers=installSignalHandlers)
../../.virtualenvs/scrape-service/lib/python3.6/site-packages/twisted/internet/base.py:1222: in startRunning
    ReactorBase.startRunning(self)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <twisted.internet.selectreactor.SelectReactor object at 0x10fe21588>

    def startRunning(self):
        """
            Method called when reactor starts: do some initialization and fire
            startup events.

            Don't call this directly, call reactor.run() instead: it should take
            care of calling this.

            This method is somewhat misnamed.  The reactor will not necessarily be
            in the running state by the time this method returns.  The only
            guarantee is that it will be on its way to the running state.
            """
        if self._started:
            raise error.ReactorAlreadyRunning()
        if self._startedBefore:
>           raise error.ReactorNotRestartable()
E           twisted.internet.error.ReactorNotRestartable

I get this reactor thing is already running so I cannot runner.start() when the second test runs. But is there some way to reset its state inbetween the tests? So they are more isolated and actually can run after one another.

Jean-Paul Calderone · Accepted Answer

According to the scrapy docs:

By default, Scrapy runs a single spider per process when you run scrapy crawl. However, Scrapy supports running multiple spiders per process using the internal API.

For example:

import scrapy
from scrapy.crawler import CrawlerProcess

class MySpider1(scrapy.Spider):
    # Your first spider definition
    ...

class MySpider2(scrapy.Spider):
    # Your second spider definition
    ...

process = CrawlerProcess()
process.crawl(MySpider1)
process.crawl(MySpider2)
process.start() # the script will block here until all crawling jobs are finished

If you want to run another spider after you've called process.start then I expect you can just issue another process.crawl(SomeSpider) call at the point in your program where you determine the need to do this.

Examples of other scenarios are given in the docs.

Scrapy raises ReactorNotRestartable when CrawlerProcess is ran twice

Tags:

python

pytest

scrapy

twisted

Joe Roe

1 Answers

Jean-Paul Calderone

Recent Activity

Donate For Us

Scrapy raises ReactorNotRestartable when CrawlerProcess is ran twice

Tags:

python

pytest

scrapy

twisted

Joe Roe

1 Answers

Jean-Paul Calderone

Related questions

Recent Activity

Donate For Us