Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Optimise network bound multiprocessing code

Tags:

python

I have a function I'm calling with multiprocessing.Pool

Like this:

from multiprocessing import Pool

def ingest_item(id):
    # goes and does alot of network calls
    # adds a bunch to a remote db
    return None

if __name__ == '__main__':
    p = Pool(12)
    thing_ids = range(1000000)
    p.map(ingest_item, thing_ids)

The list pool.map is iterating over contains around 1 million items, for each ingest_item() call it will go and call 3rd party services and add data to a remote Postgresql database.

On a 12 core machine this processes ~1,000 pool.map items in 24 hours. CPU and RAM usage is low.

How can I make this faster?

Would switching to Threads make sense as the bottleneck seems to be network calls?

Thanks in advance!

like image 437
Pythonsnake99 Avatar asked Dec 02 '25 11:12

Pythonsnake99


1 Answers

First: remember that you are performing a network task. You should expect your CPU and RAM usage to be low, because the network is orders of magnitude slower than your 12-core machine.

That said, it's wasteful to have one process per request. If you start experiencing issues from starting too many processes, you might try pycurl, as suggested here Library or tool to download multiple files in parallel

This pycurl example looks very similar to your task https://github.com/pycurl/pycurl/blob/master/examples/retriever-multi.py

like image 101
Neal Ehardt Avatar answered Dec 05 '25 00:12

Neal Ehardt



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!