Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Distribute many independent, expensive operations over multiple cores in python

Given a large list (1,000+) of completely independent objects that each need to be manipulated through some expensive function (~5 minutes each), what is the best way to distribute the work over other cores? Theoretically, I could just cut up the list into equal parts and serialize the data with cPickle (takes a few seconds) and launch a new python processes for each chunk--and it may just come to that if I intend to use multiple computers--but this feels like more of a hack than anything. Surely there is a more integrated way to do this using a multiprocessing library? Am I over-thinking this?

Thanks.

like image 905
SkyNT Avatar asked Oct 25 '25 18:10

SkyNT


1 Answers

This sounds like a good use case for a multiprocessing.Pool; depending on exactly what you're doing, it could be as simple as

pool = multiprocessing.Pool(num_procs)
results = pool.map(the_function, list_of_objects)
pool.close()

This will pickle each object in the list independently. If that's a problem, there are various ways to get around that (though all with their own problems and I don't know if any of them work on Windows). Since your computation times are fairly long that's probably irrelevant.

Since you're running this for 5 minutes x 1000 items = several days / number of cores, you probably want to do some saving of partial results along the way and print out some progress information. The easiest thing to do is probably to have your function you call save its results to a file or database or whatever; if that's not practical, you could also use apply_async in a loop and handle the results as they come in.

You could also look into something like joblib to handle this for you; I'm not very familiar with it but it seems like it's approaching the same problem.

like image 187
Danica Avatar answered Oct 27 '25 07:10

Danica



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!