Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Performance of multiprocessing vs single processing

I am fairly new to python and I am currently looking at multiprocessing. I have created a simple example that I assumed would be considerably quicker to do with multiprocessing than single processing, but as it turns out it is actually slower! The script creates and runs through a list with integers from 0 to 999, splitting it into shorter lists that worker processes then runs through and prints "I am worker [integer]". Typical run time is appr. 26 sec, while the single process script is .5-1 sec faster. Is there any particular reason why my multiprocessing script is slower? Or why it is a bad example to use for multiprocessing? The code for the two scripts are below for reference

Multiprocessing code:

import multiprocessing
from datetime import datetime

def f(x):
    listagain=[]
    for i in x:
        listagain.append("I am worker " + str(i))
    return listagain    

def chunks(l, n):
    """ Yield successive n-sized chunks from l.
    """
    lister=[]
    for i in xrange(0, len(l), n):
        lister.append(l[i:i+n])
    return lister

if __name__ == '__main__':
    startTime=datetime.now()
    Pool=multiprocessing.Pool
    mylist=list(xrange(10000))
    size=10
    listlist=[]
    listlist=chunks(mylist,size)
    workers=4
    pool=Pool(processes=workers)
    result=pool.map(f,listlist)
    pool.close()
    pool.join()
    print result
    print (datetime.now()-startTime)

Single processing code:

from datetime import datetime

def f(x):
    listagain=[]
    for i in x:
        for j in xrange(0,len(i)):
            listagain.append("I am worker " + str(i[j]))
    return listagain 

def chunks(l, n):
    """ Yield successive n-sized chunks from l.
    """
    lister=[]
    for i in xrange(0, len(l), n):
        lister.append(l[i:i+n])
    return lister

if __name__ == '__main__':
    startTime=datetime.now()
    mylist=list(xrange(10000))
    size=10
    listlist=[]
    listlist=chunks(mylist,size)
    result=f(listlist)
    print result
    print (datetime.now()-startTime)
like image 358
TTNor Avatar asked Dec 05 '25 05:12

TTNor


1 Answers

There is an overhead associated with multiprocessing that is probably higher than the time consumed for a single task in your problem, but if you have bigger tasks this overhead (usually associated with pickling Python objects) will become proportionally smaller and it will be advantageous using multiprocessing.

like image 114
Saullo G. P. Castro Avatar answered Dec 06 '25 23:12

Saullo G. P. Castro