I am learning Scrapy, a web crawling framework.
I know I can set USER_AGENT in settings.py file of the Scrapy project. When I run the Scrapy, I can see the USER_AGENT's value in INFO logs.
This USER_AGENT gets set in every download request to the server I want to crawl.
But I am using multiple USER_AGENT randomly with the help of this solution. I guess this randomly chosen USER_AGENT would be working. I want to confirm it. So, how I can make Scrapy shows USER_AGENT per download request so I can see the value of USER_AGENT in the logs?
Use set option to change the USER_AGENT value for your fetch request. Open the configuration file of your Scrapy project using your preferred text editor. Search for the USER_AGENT option. Uncomment the line and set the value to the user-agent of your choice to permanently set the user agent for your Scrapy spider.
That's where WhatIsMyBrowser.com steps in - we decode your user agent string to figure out everything it's saying. Check out our user agent analyser page, which gives you a neat breakdown of all the things we can tell you about your browser and computer based on your user agent.
The User-Agent request header is a characteristic string that lets servers and network peers identify the application, operating system, vendor, and/or version of the requesting user agent.
Just FYI.
I've implemented a simple RandomUserAgentMiddleware middleware based on fake-useragent.
Thanks to fake-useragent, you don't need to configure the list of User-Agents - it picks them up based on browser usage statistics from a real-world database.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With