Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python requests .get() from multiple pages?

I'm learning how to webscrape with python and I'm wondering if it's possible to grab two pages with requests.get() so that I don't have to make two separate calls and variables. For example:

r1 = requests.get("page1")
r2 = requests.get("page2")

pg1 = BeautifulSoup(r1.content, "html.parser")
pg2 = BeautifulSoup(r2.content, "html.parser")

As you can see there's repeated code. Any way around this? Thanks!

like image 237
dj1121 Avatar asked Oct 26 '25 10:10

dj1121


2 Answers

I like the grequests library for fetching multiple URLS at one time, instead of requests. Especially when dealing with alot of URLS or a single URL with many sub-pages.

import grequests  
urls = ['http://google.com', 'http://yahoo.com', 'http://bing.com']  
unsent_request = (grequests.get(url) for url in urls)

results = grequests.map(unsent_request) 

After this, results can be processed however you need. This works well with JSON data: results[0] = first URL data, results[1] = second URL data, etc..

more can be found here

like image 62
Aaron Nelson Avatar answered Oct 28 '25 23:10

Aaron Nelson


You can use list assignment and comprehensions, although it isn't much shorter with only two pages.

pg1, pg2 = [ BeautifulSoup(requests.get(page).content, "html.parser")
                for page in ["page1","page2"] ]
like image 29
Benedict Randall Shaw Avatar answered Oct 28 '25 22:10

Benedict Randall Shaw



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!