Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python filter array with multiprocessing

Right now I'm filtering an array using

arr = [a for a in tqdm(replays) if check(a)]

However with hundreds of thousands of elements, this takes alot of time. I was wondering if it was possible to do this with multiprocessing, ideally in a nice and compact pythonic way.

Thanks!

like image 922
Boio Avatar asked Oct 29 '25 06:10

Boio


1 Answers

Define a multiprocessing-using parallel filter function pfilter:

from multiprocessing import Pool

def pfilter(filter_func, arr, cores):
    with Pool(cores) as p:
        booleans = p.map(filter_func, arr)
        return [x for x, b in zip(arr, booleans) if b]

async means that order of execution is truly independent from each other between the elements.

The usage in your case is (4 cpus):

arr = pfilter(check, tqdm(replays), 4)

For some weird reasons however, the filter_func isn't allowed to be a lambda expression or defined as one ...

like image 51
Gwang-Jin Kim Avatar answered Oct 31 '25 20:10

Gwang-Jin Kim



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!