I have some code which looks something like this:
def run(spider_name, settings):
runner = CrawlerProcess(settings)
runner.crawl(spider_name)
runner.start()
return True
I have two py.test tests which each call run(), when the second test executes I get the following error.
runner.start()
../../.virtualenvs/scrape-service/lib/python3.6/site-packages/scrapy/crawler.py:291: in start
reactor.run(installSignalHandlers=False) # blocking call
../../.virtualenvs/scrape-service/lib/python3.6/site-packages/twisted/internet/base.py:1242: in run
self.startRunning(installSignalHandlers=installSignalHandlers)
../../.virtualenvs/scrape-service/lib/python3.6/site-packages/twisted/internet/base.py:1222: in startRunning
ReactorBase.startRunning(self)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <twisted.internet.selectreactor.SelectReactor object at 0x10fe21588>
def startRunning(self):
"""
Method called when reactor starts: do some initialization and fire
startup events.
Don't call this directly, call reactor.run() instead: it should take
care of calling this.
This method is somewhat misnamed. The reactor will not necessarily be
in the running state by the time this method returns. The only
guarantee is that it will be on its way to the running state.
"""
if self._started:
raise error.ReactorAlreadyRunning()
if self._startedBefore:
> raise error.ReactorNotRestartable()
E twisted.internet.error.ReactorNotRestartable
I get this reactor thing is already running so I cannot runner.start() when the second test runs. But is there some way to reset its state inbetween the tests? So they are more isolated and actually can run after one another.
According to the scrapy docs:
By default, Scrapy runs a single spider per process when you run scrapy crawl. However, Scrapy supports running multiple spiders per process using the internal API.
For example:
import scrapy
from scrapy.crawler import CrawlerProcess
class MySpider1(scrapy.Spider):
# Your first spider definition
...
class MySpider2(scrapy.Spider):
# Your second spider definition
...
process = CrawlerProcess()
process.crawl(MySpider1)
process.crawl(MySpider2)
process.start() # the script will block here until all crawling jobs are finished
If you want to run another spider after you've called process.start then I expect you can just issue another process.crawl(SomeSpider) call at the point in your program where you determine the need to do this.
Examples of other scenarios are given in the docs.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With