NumPy is faster than PyTorch for larger cross or outer products

Question

I'm computing huge outer products between vectors of size (50500,) and found out that NumPy is (much?) faster than PyTorch while doing so.

Here are the tests:

# NumPy

In [64]: a = np.arange(50500) 
In [65]: b = a.copy()  

In [67]: %timeit np.outer(a, b) 
5.81 s ± 56.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

-------------

# PyTorch

In [73]: t1 = torch.arange(50500)
In [76]: t2 = t1.clone()

In [79]: %timeit torch.ger(t1, t2) 
7.73 s ± 143 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

I'd ideally like to have the computation done in PyTorch. So, how can I speed things up for computing outer product in PyTorch for such huge vectors?

Note: I tried to move the tensors to GPU but I was treated with MemoryError because it needs around 19 GiB of space. So, I eventually have to do it on the CPU.

A. Chang · Accepted Answer

Unfortunately there's really no way to specifically speed up torch's method of computing the outer product torch.ger() without a vast amount of effort.

Explanation and Options

The reason numpy function np.outer() is so fast is because it's written in C, which you can see here: https://github.com/numpy/numpy/blob/7e3d558aeee5a8a5eae5ebb6aef03de892a92ebd/numpy/core/numeric.py#L1123 where the function uses operations from the umath C source code.

Pytorch's torch.ger() function is written in C++ here: https://github.com/pytorch/pytorch/blob/7ce634ebc2943ff11d2ec727b7db83ab9758a6e0/aten/src/ATen/native/LinearAlgebra.cpp#L142 which makes it ever so slightly slower as you can see in your example.

Your options to "speed up computing outer product in PyTorch" would be to add a C implementation for outer product in pytorch's native code, or make your own outer product function while interfacing with C using something like Cython if you really don't want to use numpy (which wouldn't make much sense).

P.S.

Also just as an aside, using GPUs would only improve your parallel computation speed on the GPU which may not outweigh the cost of time required to transfer data between RAM and GPU memory.

NumPy is faster than PyTorch for larger cross or outer products

Tags:

python

optimization

numpy

pytorch

cross-product

kmario23

1 Answers

A. Chang

Recent Activity

Donate For Us

NumPy is faster than PyTorch for larger cross or outer products

Tags:

python

optimization

numpy

pytorch

cross-product

kmario23

1 Answers

A. Chang

Related questions

Recent Activity

Donate For Us