Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

python multiprocessing array

Despite all the seemingly similar questions and answers, here goes:

I have a fairly large 2D numpy array and would like to process it row by row using multiprocessing. For each row I need to find specific (numeric) values and use them to set values in a second 2D numpy array. A small example (real use for array with appr. 10000x10000 cells):

import numpy as np
inarray = np.array([(1.5,2,3), (4,5.1,6), (2.7, 4.8, 4.3)])
outarray = np.array([(0.0,0.0,0.0), (0.0,0.0,0.0), (0.0,0.0,0.0)])

I would now like to process inarray row by row using multiprocessing, to find all the cells in inarray that are greater than 5 (e.g. inarray[1,1] and inarray[1,2], and set cells in outarray that have index locations one smaller in both dimensions (e.g. outarray[0,0] and outarray[0,1]) to 1.

After looking here and here and here I'm sad to say I still don't know how to do it. Help!

like image 759
user3634426 Avatar asked Mar 20 '26 16:03

user3634426


1 Answers

If you can use the latest numpy development version, then you can use multithreading instead of multiprocessing. Since this PR was merged a couple of months ago, numpy releases the GIL when indexing, so you can do something like:

import numpy as np
import threading

def target(in_, out):
    out[in_ > .5] = 1

def multi_threaded(a, thread_count=3):
    b = np.zeros_like(a)
    chunk = len(a) // thread_count
    threads = []
    for j in xrange(thread_count):
        sl_a = slice(1 + chunk*j,
                     a.shape[0] if j == thread_count-1 else 1 + chunk*(j+1),
                     None)
        sl_b = slice(sl_a.start-1, sl_a.stop-1, None)
        threads.append(threading.Thread(target=target, args=(a[sl_a, 1:],
                                                             b[sl_b, :-1])))
    for t in threads:
        t.start()
    for t in threads:
        t.join()
    return b

And now do things like:

In [32]: a = np.random.rand(100, 100000)

In [33]: %timeit multi_threaded(a, 1)
1 loops, best of 3: 121 ms per loop

In [34]: %timeit multi_threaded(a, 2)
10 loops, best of 3: 86.6 ms per loop

In [35]: %timeit multi_threaded(a, 3)
10 loops, best of 3: 79.4 ms per loop
like image 185
Jaime Avatar answered Mar 23 '26 07:03

Jaime



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!