Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Combine python thread results into one list

I am trying to modify the solution shown here: What is the fastest way to send 100,000 HTTP requests in Python? except that instead of checking header status I am making an API request which returns a dictionary and I would like the end result of all of these API requests to be a list of all of the dictionaries.

Here is my code -- consider that api_calls is a list that has each url to open for the json request...

from threading import Thread
from Queue import Queue

concurrent = 200 

def doWork():
    while True:
        url = q.get()
        result = makeRequest(url[0])
        doSomethingWithResult(result, url)
        q.task_done()

def makeRequest(ourl):
    try:
        api_call = urlopen(ourl).read()
        result = json.loads(api_call)
        return result, ourl
    except:
        return "error", ourl

def doSomethingWithResult(result, url):
  print(url,result)

q = Queue(concurrent * 2)
for i in range(concurrent):
    t = Thread(target=doWork)
    t.daemon = True
    t.start()
try:
    for url in api_calls:
        q.put(url)
    q.join()
except KeyboardInterrupt:
    sys.exit(1)

Like the example linked, this currently will succesfully print the url, result on each line. What I would instead like to do is add the (url, result) to a list in each thread and then at the end join them into one master list. I cannot figure out how to have this master list and join the results at the end. Can anybody help with what I should modify in the doSomethingWithResult? If I was doing one large loop, I would just have an empty list and I would append the result to the list after each API request, but I do not know how to mimick this now that I am using threads.

I expect that a common response will be to use https://en.wikipedia.org/wiki/Asynchronous_I/O and if this is the suggestion, then I would appreciate somebody actually providing an example that accomplishes as much as the code I have linked above.

like image 534
reese0106 Avatar asked Oct 20 '25 08:10

reese0106


1 Answers

Use a ThreadPool instead. It does the heavy lifting for you. Here is a working example that fetches a few urls.

import multiprocessing.pool
concurrent = 200 

def makeRequest(ourl):
    try:
        api_call = urlopen(ourl).read()
        result = json.loads(api_call)
        return "success", ourl
    except:
        return "error", ourl

def main():
    api_calls = [
        'http:http://jsonplaceholder.typicode.com/posts/{}'.format(i)
        for i in range(1,5)]

    # a thread pool that implements the process pool API.
    pool = multiprocessing.pool.ThreadPool(processes=concurrent)
    return_list = pool.map(makeRequest, api_calls, chunksize=1)
    pool.close()
    for status, data in return_list:
        print(data)

main()
like image 166
tdelaney Avatar answered Oct 21 '25 21:10

tdelaney