I need to scrape a web page that is a javascript-rendered AngularJS app. The developers of the site detect Safari/Firefox in private browsing mode and disallow it to be used, and therefore scraped. The page works with Safari/Firefox when you are not in private mode.
The interesting thing is that no such warning is given when using Chrome whether in private mode or not. I was using Scrapy+Selenium, but I was really hoping to use ScrapyJS/Splash for this project. However, it looks like the Scrapy/Splash combination suffers from the website's private browsing wall.
Is it possible to tell Scrapy to use Chrome? I know Selenium has quite a few drivers, and it is pretty well documented on how to use each, but I can't find any info on if Scrapy has support for other browsers or if someone else has already done this. Google/SO searches haven't illuminated this at all for me either.
Starting from Splash 2.0, you can disable Private mode (which is "on" by default).
There are two ways to go about it:
at startup, with the --disable-private-mode argument, e.g., if you're using Docker:
$ sudo docker run -p 5023:5023 -p 8050:8050 -p 8051:8051 scrapinghub/splash --disable-private-mode
at runtime when using the /execute endpoint and setting splash.private_mode_enabled=false
Also, take note of the effect of disabling private mode:
Note that if you disable private mode browsing data such as cookies or items kept in local storage may persist between requests.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With