Most efficient way to rearrange 2D numpy array from another 2D index array

Question

In brief

In Python 3.6 and using Numpy, what would be the most efficient way to rearrange the elements of a 2D array according to indices present in a different, similarly shaped, index 2D array?

Detailed

Suppose I have the following two 9 x 5 arrays, called A and B:

import numpy as np
A = np.array([[0.32, 0.35, 0.88, 0.63, 1.  ],
              [0.23, 0.69, 0.98, 0.22, 0.96],
              [0.7 , 0.51, 0.09, 0.58, 0.19],
              [0.98, 0.42, 0.62, 0.94, 0.46],
              [0.48, 0.59, 0.17, 0.23, 0.98]])

B = np.array([[4, 0, 3, 2, 1],
              [3, 2, 4, 1, 0],
              [4, 3, 0, 2, 1],
              [4, 2, 0, 3, 1],
              [0, 3, 1, 2, 4]])

I can successfully rearrange A using B as an index array by it by np.array(list(map(lambda i, j: j[i], B, A))):

array([[1.  , 0.32, 0.63, 0.88, 0.35],
       [0.22, 0.98, 0.96, 0.69, 0.23],
       [0.19, 0.58, 0.7 , 0.09, 0.51],
       [0.46, 0.62, 0.98, 0.94, 0.42],
       [0.48, 0.23, 0.59, 0.17, 0.98]])

However, when the dimensions of A and B increase, such a solution becomes really inefficient. If I am not mistaken, that is because:

using the lambda loops over all rows of A instead of relying on Numpy vectorizations
mapping is slow
converting list to array eats precious time.

Since in my real use case those arrays can grow quite big, and I have to reorder many of them in a long loop, a lot of my current performance bottleneck (measured with a profiler) comes from that single line of code above.

My question: what would the most efficient, more Numpy-smart way of achieving the above?

A toy code to test general arrays and time the process could be:

import numpy as np
nRows = 20000
nCols = 10000
A = np.round(np.random.uniform(0, 1, (nRows, nCols)), 2)
B = np.full((nRows, nCols), range(nCols))
for r in range(nRows):
    np.random.shuffle(B[r])
%time X = np.array(list(map(lambda i, j: j[i], B, A)))

bousof · Accepted Answer

A comparison with three other possibilities:

import numpy as np
import time

# Input
nRows = 20000
nCols = 10000
A = np.round(np.random.uniform(0, 1, (nRows, nCols)), 2)
B = np.full((nRows, nCols), range(nCols))
for r in range(nRows):
  np.random.shuffle(B[r])

# Original
t_start = time.time()
X = np.array(list(map(lambda i, j: j[i], B, A)))
print('Timer 1:', time.time()-t_start, 's')

# FOR loop
t_start = time.time()
X = np.zeros((nRows, nCols))
for i in range(nRows):
  X[i] = A[i][B[i]]
print('Timer 2:', time.time()-t_start, 's')

# take_along_axis
t_start = time.time()
X = np.take_along_axis(A,B,1)
print('Timer 3:', time.time()-t_start, 's')

# Indexing
t_start = time.time()
X = A[ np.arange(nRows)[:,None],B]
print('Timer 4:', time.time()-t_start, 's')

Ouput:

% python3 script.py
Timer 1: 2.191567897796631 s
Timer 2: 1.3516249656677246 s
Timer 3: 1.675267219543457 s
Timer 4: 1.646852970123291 s

For low number of columns (nRows,nCols)=(200000,10) the results are completely different however:

% python3 script.py
Timer 1: 0.2729799747467041 s
Timer 2: 0.22678399085998535 s
Timer 3: 0.016162633895874023 s
Timer 4: 0.014748811721801758 s

Most efficient way to rearrange 2D numpy array from another 2D index array

Tags:

performance

python

arrays

numpy

In brief

Detailed

AbbieW

1 Answers

bousof

Recent Activity

Donate For Us

Most efficient way to rearrange 2D numpy array from another 2D index array

Tags:

performance

python

arrays

numpy

In brief

Detailed

AbbieW

1 Answers

bousof

Related questions

Recent Activity

Donate For Us