I'm learning how to webscrape with python and I'm wondering if it's possible to grab two pages with requests.get() so that I don't have to make two separate calls and variables. For example:
r1 = requests.get("page1")
r2 = requests.get("page2")
pg1 = BeautifulSoup(r1.content, "html.parser")
pg2 = BeautifulSoup(r2.content, "html.parser")
As you can see there's repeated code. Any way around this? Thanks!
I like the grequests library for fetching multiple URLS at one time, instead of requests. Especially when dealing with alot of URLS or a single URL with many sub-pages.
import grequests
urls = ['http://google.com', 'http://yahoo.com', 'http://bing.com']
unsent_request = (grequests.get(url) for url in urls)
results = grequests.map(unsent_request)
After this, results can be processed however you need. This works well with JSON data: results[0] = first URL data, results[1] = second URL data, etc..
more can be found here
You can use list assignment and comprehensions, although it isn't much shorter with only two pages.
pg1, pg2 = [ BeautifulSoup(requests.get(page).content, "html.parser")
for page in ["page1","page2"] ]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With