Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Numpy advanced indexing using a 2D array of row indices without broadcasting the output

I have an ndarray array with ndim 3, and some indices ndarray idxs with ndim 2, which specify indices for the first dimension of array. The first dimension of idxs matches the second dimension of array, i.e. idxs.shape[0] == array.shape[1].

I want to get a resulting ndarray result with ndim 3 and shape (idxs.shape[1], array.shape[1], array.shape[2]) like this:

for i0 in range(idxs.shape[1]):
    for i1 in range(array.shape[1]):
        result[i0, i1] = array[idxs[i1, i0], i1]

How can I get this more directly?

I thought about using advanced indexing but I'm not exactly sure how that would look like.

In Theano, the following works:

dim1 = theano.tensor.arange(array.shape[1])
result = array[idxs[dim1], dim1]
like image 574
Albert Avatar asked Oct 23 '25 17:10

Albert


1 Answers

Your for loop does this:

out[i, j] == array[idxs[j, i], j]

That is to say, the j,ith element in idxs gives the row index into array for the i,jth element in out. The corresponding set of column indices into array are just the sequence integers between 0 and idxs.shape[0] - 1 (which happens to be the same as array.shape[1] - 1 in this case, but need not be in general).

Your for loop can therefore be replaced with a single array indexing operation like this:

def simplified(array, idxs):
    return array[idxs.T, np.arange(idxs.shape[0])]

We can test for correctness and speed against the functions in @Divakar's answer:

m, n = 500, 400
array = np.random.rand(m, n)
idxs = np.random.randint(n, size=(n, m))

print(np.allclose(forloop(array, idxs), simplified(array, idxs)))
# True

%timeit forloop(array, idxs)
# 10 loops, best of 3: 101 ms per loop

%timeit broadcasted_indexing(array, idxs)
# 100 loops, best of 3: 4.1 ms per loop

%timeit simplified(array, idxs)
# 1000 loops, best of 3: 1.66 ms per loop
like image 100
ali_m Avatar answered Oct 25 '25 07:10

ali_m