Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Finding indices of k top values in a 2D array/matrix

Tags:

python

numpy

I have a 2D matrix with values, and I want to find the top 5 values' indices. For example for

matrix([[0.17542851, 0.13199346, 0.01579704, 0.01429822, 0.01302919],
        [0.13279703, 0.12444886, 0.04742024, 0.03114371, 0.02623729],
        [0.13502306, 0.07815065, 0.07291175, 0.03690815, 0.02163695],
        [0.19032505, 0.15853737, 0.05889324, 0.02791679, 0.02699252],
        [0.1695696 , 0.14538635, 0.07127667, 0.04997876, 0.02580234]])

I want to get (0,3), (0,1), (0,4), (3,1), (4,1)

I searched and tried many workaround, including np.argmax(), np.argsort(), np.argpartition() without any good results. For example:

>>np.dstack(np.unravel_index(np.argsort(a.ravel(),axis=None), a.shape))

array([[[0, 4],
        [0, 3],
        [0, 2],
        [2, 4],
        [4, 4],
        [1, 4],
        [3, 4],
        [3, 3],
        [1, 3],
        [2, 3],
        [1, 2],
        [4, 3],
        [3, 2],
        [4, 2],
        [2, 2],
        [2, 1],
        [1, 1],
        [0, 1],
        [1, 0],
        [2, 0],
        [4, 1],
        [3, 1],
        [4, 0],
        [0, 0],
        [3, 0]]], dtype=int64)

this result makes no sense. Notice that I want the original indices, I don't care about the order (just want the top 5 in any order, ascending will be better though)

like image 368
M.F Avatar asked Sep 06 '25 06:09

M.F


2 Answers

np.argpartition should be a good tool (efficient one) to get those top k indices without maintaining order. Hence, for array data a, it would be -

In [43]: np.c_[np.unravel_index(np.argpartition(a.ravel(),-5)[-5:],a.shape)]
Out[43]: 
array([[4, 1],
       [3, 1],
       [4, 0],
       [0, 0],
       [3, 0]])

To explain, let's break it down into single process steps -

# Get partitioned indices such that the last 5 indices refer to the top 5
# values taken globally from the input array. Refer to docs for more info
# Note that it's global because we will flatten it. 
In [9]: np.argpartition(a.ravel(),-5)
Out[9]: 
array([14, 24,  2,  3,  4, 23, 22,  7,  8,  9, 19, 18, 17, 13, 12, 11,  6,
        1,  5, 10, 21, 16, 20,  0, 15])

# Get last 5 indices, which are the top 5 valued indices
In [10]: np.argpartition(a.ravel(),-5)[-5:]
Out[10]: array([21, 16, 20,  0, 15])

# Convert the global indices back to row-col format
In [11]: np.unravel_index(np.argpartition(a.ravel(),-5)[-5:],a.shape)
Out[11]: (array([4, 3, 4, 0, 3]), array([1, 1, 0, 0, 0]))

# Stack into two-columnar array
In [12]: np.c_[np.unravel_index(np.argpartition(a.ravel(),-5)[-5:],a.shape)]
Out[12]: 
array([[4, 1],
       [3, 1],
       [4, 0],
       [0, 0],
       [3, 0]])

For matrix data in a, it would be -

In [48]: np.dstack(np.unravel_index(np.argpartition(a.ravel(),-5)[:,-5:],a.shape))
Out[48]: 
array([[[4, 1],
        [3, 1],
        [4, 0],
        [0, 0],
        [3, 0]]])

So, compared to the array, the only difference is with the usage of np.dstack, because with matrix data, the data always stays as 2D.

Notice that these are your results from the last 5 rows.

like image 126
Divakar Avatar answered Sep 09 '25 23:09

Divakar


Your sample:

n = np.array([[0.17542851, 0.13199346, 0.01579704, 0.01429822, 0.01302919],
        [0.13279703, 0.12444886, 0.04742024, 0.03114371, 0.02623729],
        [0.13502306, 0.07815065, 0.07291175, 0.03690815, 0.02163695],
        [0.19032505, 0.15853737, 0.05889324, 0.02791679, 0.02699252],
        [0.1695696 , 0.14538635, 0.07127667, 0.04997876, 0.02580234]])

Your output is not top 5 values' indice. Top 5 values are

array([0.14538635, 0.15853737, 0.1695696 , 0.17542851, 0.19032505])

To get their indices: sort and using isin to flag their location True. Finally, use argwhere to get their posistion

np.argwhere(np.isin(n, np.sort(n, axis=None)[-5:]))

Out[324]:
array([[0, 0],
       [3, 0],
       [3, 1],
       [4, 0],
       [4, 1]], dtype=int32)
like image 24
Andy L. Avatar answered Sep 09 '25 23:09

Andy L.