Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Multithreading in Python 2.7

I am not sure how to do multithreading and after reading a few stackoverflow answers, I came up with this. Note: Python 2.7

from multiprocessing.pool import ThreadPool as Pool
pool_size=10
pool=Pool(pool_size)

for region, directory_ids in direct_dict.iteritems():
    for dir in directory_ids:  
        try:
            async_result=pool.apply_async(describe_with_directory_workspaces, (region, dir, username))
            result=async_result.get()
            code=result[0]      
            content=result[1]
        except Exception as e:
            print "Some error happening"
            print e

        if code==0:
            if content:
                 new_content.append(content)
            else:
                 pass
        else:
            return error_html(environ, start_response, content)

What I am trying to do here is calling describe_with_directory_workspaces with different parameters of region and directories and run it in parallel so that I get the data quickly in new content. Currently, it is going in series which is what giving slow performance to end user.

Am I doing it right? Or is there some better way to do it? How can I confirm that I am getting the multithreading running as I expected it to?

like image 251
user3089927 Avatar asked Feb 04 '26 19:02

user3089927


2 Answers

You don't want to call async_result.get until you've queued all of your jobs, otherwise you will only allow one job to run at a time.

Try queueing all of your jobs first, then processing each result after they've all been queued. Something like:

results = []
for region, directory_ids in direct_dict.iteritems():
    for dir in directory_ids:
        result = pool.apply_async(describe_with_directory_workspaces,
                                  (region, dir, username))
        results.append(result)

for result in results:
    code, content = result.get()
    if code == 0:
        # ...

If you want to handle the results in an asynchronous callback, you can supply a callback argument to pool.apply_async as documented here:

for region, directory_ids in direct_dict.iteritems():
    for dir in directory_ids:
        def done_callback(result):
            pass  # process result...

        pool.apply_async(describe_with_directory_workspaces,
                         (region, dir, username),
                         done_callback)
like image 152
Myk Willis Avatar answered Feb 06 '26 08:02

Myk Willis


You should look into Python's multiprocessing module.

From Python: Essential Reference by Beazley:

"Python threads are actually rather restricted. Although minimally thread-safe, the Python interpreter uses an internal Global Interpreter Lock that only allows a single Python thread to execute at any given moment. This restricts python programs to run on a single processor regardless of how many CPU cores might be available on the system."

So: If you have a lot of CPU processing going on, use multiprocessing.

Link to the documentation: https://docs.python.org/2/library/multiprocessing.html

multiprocessing Pools might be useful in your case.

EDIT: Missed that code was using multiprocessing already. See comments for what might be a more helpful answer. Also, for an example of how to use apply_async with callbacks, see: Python multiprocessing.Pool: when to use apply, apply_async or map? Note that Pool also has a map_async function.

See section 16.6.2.9 on the above link.

EDIT2: Example code to use get( ) outside of loop:

from multiprocessing import Pool

def sqr(x):
    return x*x

if __name__ == '__main__':
    po = Pool(processes = 10)
    resultslist = []
    for i in range(1, 10):
        arg = [i]
        result = po.apply_async(sqr, arg)
        resultslist.append(result)

    for res in resultslist:
        print(res.get())
like image 31
sgrg Avatar answered Feb 06 '26 08:02

sgrg