I have already done some multiprocessing in the past, but this time, I can't figure out a workaround.
I know that I can only pickle functions if they are at the top level of a module. This has always worked well so far, but now I have to work with shared memory in an instance and I don't see a way to move the function to the top level.
Consider this
import numpy as np
import multiprocessing
from itertools import repeat
class Test:
def __init__(self, x, y):
self.x = x
self.y = y
def my_task(self):
# Create process pool
p = multiprocessing.Pool(4)
# Create shared memory arrays
share1 = multiprocessing.Array("d", self.x, lock=False)
share2 = multiprocessing.Array("d", self.y, lock=False)
def mp(xc, yc, c):
# This is just some random weird statement
foo = np.sum(share1) + np.sum(share2) +xc + yc + c
return foo
def mp_star(args):
return mp(*args)
# Define some input for multiprocessing
xs = [1,2,3,4,5]
ys = [5,6,7,8,9]
c = 10
# Submit tasks
result = p.map(mp_star, zip(xs, ys, repeat(c)))
# Close pool
p.close()
return result
# Get some input data
x = np.arange(10)
y = x**2
# Run the thing
cl = Test(x=x, y=y)
cl.my_task()
You can see that I need to access shared data from the instance itself. For this reason I put the multiprocessing parts within the method 'my_task'. For this reason I get the typical pickle error
_pickle.PicklingError: Can't pickle <function Test.my_task.<locals>.mp_star at 0x10224a400>: attribute lookup mp_star on __main__ failed
which I already know about. I can't move the multiprocessing tasks to the top level though since I need to access the shared data. Also I want to keep the number of dependencies low so I need to work with the built-in multiprocessing libraries.
I hope the code makes sense. So, how can I use the shared memory space from an instance in multiprocessing? Is there a way to move the functions to the top level?
Since the only functions that can be pickled are those in top level (see the documentation for pickle) and multiprocessing want to pickle it you're stuck with putting it at top level. You simply has to rework your requirement.
For example you've got arguments to the functions, why not supplying the shared data? Or you could put the shared data in an instance that is pickleable and have the function being at top level (you can still supply a class instance to a top level function).
For example if you want to put the shared data in an instance you can simply define the method at top level as if it were a normal method (but put the definition at top level):
def fubar(self):
return self.x
class C(object):
def __init__(self, x):
self.x = x
foo = fubar
c = C()
now you can pickle fubar. You can call it either as c.foo() or fubar(c), but you can only pickle it as pickle.dumps(fubar) so when it's unpickled and called it will expect to be called in the later way so you have to supply the self parameter along with the other arguments in p.map (ie p.map(mp_star, zip(repeat(self), xs, ys, repeat(c))). You have of course to make sure that self is pickleable too.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With