I am not sure how to do multithreading and after reading a few stackoverflow answers, I came up with this. Note: Python 2.7
from multiprocessing.pool import ThreadPool as Pool
pool_size=10
pool=Pool(pool_size)
for region, directory_ids in direct_dict.iteritems():
for dir in directory_ids:
try:
async_result=pool.apply_async(describe_with_directory_workspaces, (region, dir, username))
result=async_result.get()
code=result[0]
content=result[1]
except Exception as e:
print "Some error happening"
print e
if code==0:
if content:
new_content.append(content)
else:
pass
else:
return error_html(environ, start_response, content)
What I am trying to do here is calling describe_with_directory_workspaces with different parameters of region and directories and run it in parallel so that I get the data quickly in new content. Currently, it is going in series which is what giving slow performance to end user.
Am I doing it right? Or is there some better way to do it? How can I confirm that I am getting the multithreading running as I expected it to?
You don't want to call async_result.get until you've queued all of your jobs, otherwise you will only allow one job to run at a time.
Try queueing all of your jobs first, then processing each result after they've all been queued. Something like:
results = []
for region, directory_ids in direct_dict.iteritems():
for dir in directory_ids:
result = pool.apply_async(describe_with_directory_workspaces,
(region, dir, username))
results.append(result)
for result in results:
code, content = result.get()
if code == 0:
# ...
If you want to handle the results in an asynchronous callback, you can supply a callback argument to pool.apply_async as documented here:
for region, directory_ids in direct_dict.iteritems():
for dir in directory_ids:
def done_callback(result):
pass # process result...
pool.apply_async(describe_with_directory_workspaces,
(region, dir, username),
done_callback)
You should look into Python's multiprocessing module.
From Python: Essential Reference by Beazley:
"Python threads are actually rather restricted. Although minimally thread-safe, the Python interpreter uses an internal Global Interpreter Lock that only allows a single Python thread to execute at any given moment. This restricts python programs to run on a single processor regardless of how many CPU cores might be available on the system."
So: If you have a lot of CPU processing going on, use multiprocessing.
Link to the documentation: https://docs.python.org/2/library/multiprocessing.html
multiprocessing Pools might be useful in your case.
EDIT: Missed that code was using multiprocessing already. See comments for what might be a more helpful answer. Also, for an example of how to use apply_async with callbacks, see: Python multiprocessing.Pool: when to use apply, apply_async or map? Note that Pool also has a map_async function.
See section 16.6.2.9 on the above link.
EDIT2: Example code to use get( ) outside of loop:
from multiprocessing import Pool
def sqr(x):
return x*x
if __name__ == '__main__':
po = Pool(processes = 10)
resultslist = []
for i in range(1, 10):
arg = [i]
result = po.apply_async(sqr, arg)
resultslist.append(result)
for res in resultslist:
print(res.get())
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With