Hello I am using the requests module and I would like to improve the speed because I have many urls so I suppose I can use threading to have a better speed. Here is my code :
import requests
urls = ["http://www.google.com", "http://www.apple.com", "http://www.microsoft.com", "http://www.amazon.com", "http://www.facebook.com"]
for url in urls:
reponse = requests.get(url)
value = reponse.json()
But I don't know how to use requests with threading ...
Could you help me please ?
Thank you !
Just to add from bashrc, you can also use it with requests. You don't need to use urllib.request method.
it would be something like :
from concurrent import futures
URLS = ['http://www.foxnews.com/',
'http://www.cnn.com/',
'http://europe.wsj.com/',
'http://www.bbc.co.uk/',
'http://some-made-up-domain.com/']
with futures.ThreadPoolExecutor(max_workers=5) as executor: ## you can increase the amount of workers, it would increase the amount of thread created
res = executor.map(requests.get,URLS)
responses = list(res) ## the future is returning a generator. You may want to turn it to list.
What I like to do however, it is to create a function that returns directly the json from the response (or the text if you want to scrape). And use that function in the threadpool
import requests
from concurrent import futures
URLS = ['http://www.foxnews.com/',
'http://www.cnn.com/',
'http://europe.wsj.com/',
'http://www.bbc.co.uk/',
'http://some-made-up-domain.com/']
def getData(url):
res = requests.get(url)
try:
return res.json()
except:
return res.text
with futures.ThreadPoolExecutor(max_workers=5) as executor:
res = executor.map(getData,URLS)
responses = list(res) ## your list will already be pre-formated
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With