Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

faster numpy array copy; multi-threaded memcpy?

Suppose we have two large numpy arrays of the same data type and shape, of size on the order of GB's. What is the fastest way to copy all the values from one into the other?

When I do this using normal notation, e.g. A[:] = B, I see exactly one core on the computer at maximum effort doing the copy for several seconds, while the others are idle. When I launch multiple workers using multiprocessing and have them each copy a distinct slice into the destination array, such that all the data is copied, using multiple workers is faster. This is true regardless of whether the destination array is a shared memory array or one that becomes local to the worker. I can get a 5-10x speedup in some tests on a machine with many cores. As I add more workers, the speed does eventually level off and even slow down, so I think this achieves being memory-performance bound.

I'm not suggesting using multiprocessing for this problem; it was merely to demonstrate the possibility of better hardware utilization.

Does there exist a python interface to some multi-threaded C/C++ memcpy tool?

Update (03 May 2017)

When it is possible, using multiple python processes to move data can give major speedup. I have a scenario in which I already have several small shared memory buffers getting written to by worker processes. Whenever one fills up, the master process collects this data and copies it into a master buffer. But it is much faster to have the master only select the location in the master buffer, and assign a recording worker to actually do the copying (from a large set of recording processes standing by). On my particular computer, several GB can be moved in a small fraction of a second by concurrent workers, as opposed to several seconds by a single process.

Still, this sort of setup is not always (or even usually?) possible, so it would be great to have a single python process able to drop into a multi-threaded memcpy routine...

like image 375
Adam S. Avatar asked Feb 23 '26 12:02

Adam S.


1 Answers

If you are certain that the types/memory layout of both arrays are identical, this might give you a speedup: memoryview(A)[:] = memoryview(B) This should be using memcpy directly and skips any checks for numpy broadcasting or type conversion rules.

like image 57
David M Avatar answered Feb 25 '26 01:02

David M



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!