We have a quite normal Scrapy project, something like that:
project/
       setup.py
       scrapy.cfg
       SOME_DIR_WITH_PYTHON_MODULE/
                                  __init__.py
       project/
              settings.py
              pipelines.py
              __init__.py
              spiders/
                     __init__.py
                     somespider.py
Everything works great if we run it from command line scrapy crawl somespider...
But when we deploy it and run using Scrapyd, it just fails to import the code from SOME_DIR_WITH_PYTHON_MODULE. Looks like it doesn't see the code there for some unknown reasons.
We tried to import it in the pipelines.py file. Tried like that:
from project.SOME_DIR_WITH_PYTHON_MODULE import *
and like that:
from SOME_DIR_WITH_PYTHON_MODULE import *
...and nothing worked. Though it worked if ran from command-line 'direct' execution using scrapy crawl.
What should we do to make it work?
Thanks!
Actually, I found the reason. I should've used data_files param:
setup(
    name='blabla',
    version='1.0',
    packages=find_packages(),
    entry_points={'scrapy': ['settings = blabla.settings']},
    zip_safe=False,
    include_package_data=True,
    data_files=[(root, [os.path.join(root, f) for f in files])
         for root, _, files in itertools.chain(os.walk('monitoring'),
                                               os.walk('blabla/data'))],
    install_requires=[
        "Scrapy>=0.22",
    ],
    extras_require={
        'Somemodule': ["numpy"],
    }
)
That's a bit weird because the code is the data, actually... but it worked for us.
Thanks for the attention. Solved.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With