Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Multi-theaded Numpy inserts

I am considering creating a numpy table as key/value database. The inputs/update would be multi-theaded.

Exploring the idea, Problem: would GIL stop theads and only allow one update at time. Problem: can numpy table (tablespace) be mutlitheaded.

like image 548
Merlin Avatar asked Jan 23 '26 23:01

Merlin


1 Answers

Some numpy functions are not atomic, so if two threads were to operate on the same array by calling some non-atomic numpy functions, then the array will become mangled because the order of operations will be mixed up in some non-anticipated way.

There are many examples, but just to be concrete, numpy.apply_along_axis is a long sequence of Python statements, clearly not atomic.

The GIL will not help you since it could stop one thread while it is only partly through some non-atomic numpy function, then start another thread which is operating on the same array...

So to be thread-safe, you would need to use a threading.Lock and only operate on the array after the Lock has been acquired:

with lock:
    arr = ...

Having to use a lock everywhere calls into question whether there is any benefit to having multiple threads operating on same array. Note that sometimes multithreading on a CPU-bound problem may result in slower performance than a comparable single-threaded version.

See also the ParallelProgramming with numpy and scipy wiki page for more alternatives and discussion.

like image 93
unutbu Avatar answered Jan 25 '26 12:01

unutbu



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!