I am considering creating a numpy table as key/value database. The inputs/update would be multi-theaded.
Exploring the idea, Problem: would GIL stop theads and only allow one update at time. Problem: can numpy table (tablespace) be mutlitheaded.
Some numpy functions are not atomic, so if two threads were to operate on the same array by calling some non-atomic numpy functions, then the array will become mangled because the order of operations will be mixed up in some non-anticipated way.
There are many examples, but just to be concrete, numpy.apply_along_axis is a long sequence of Python statements, clearly not atomic.
The GIL will not help you since it could stop one thread while it is only partly through some non-atomic numpy function, then start another thread which is operating on the same array...
So to be thread-safe, you would need to use a threading.Lock and only operate on the array after the Lock has been acquired:
with lock:
arr = ...
Having to use a lock everywhere calls into question whether there is any benefit to having multiple threads operating on same array. Note that sometimes multithreading on a CPU-bound problem may result in slower performance than a comparable single-threaded version.
See also the ParallelProgramming with numpy and scipy wiki page for more alternatives and discussion.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With