I have a 2D matrix with values, and I want to find the top 5 values' indices. For example for
matrix([[0.17542851, 0.13199346, 0.01579704, 0.01429822, 0.01302919],
[0.13279703, 0.12444886, 0.04742024, 0.03114371, 0.02623729],
[0.13502306, 0.07815065, 0.07291175, 0.03690815, 0.02163695],
[0.19032505, 0.15853737, 0.05889324, 0.02791679, 0.02699252],
[0.1695696 , 0.14538635, 0.07127667, 0.04997876, 0.02580234]])
I want to get (0,3), (0,1), (0,4), (3,1), (4,1)
I searched and tried many workaround, including np.argmax(), np.argsort(), np.argpartition()
without any good results.
For example:
>>np.dstack(np.unravel_index(np.argsort(a.ravel(),axis=None), a.shape))
array([[[0, 4],
[0, 3],
[0, 2],
[2, 4],
[4, 4],
[1, 4],
[3, 4],
[3, 3],
[1, 3],
[2, 3],
[1, 2],
[4, 3],
[3, 2],
[4, 2],
[2, 2],
[2, 1],
[1, 1],
[0, 1],
[1, 0],
[2, 0],
[4, 1],
[3, 1],
[4, 0],
[0, 0],
[3, 0]]], dtype=int64)
this result makes no sense. Notice that I want the original indices, I don't care about the order (just want the top 5 in any order, ascending will be better though)
np.argpartition
should be a good tool (efficient one) to get those top k
indices without maintaining order. Hence, for array data a
, it would be -
In [43]: np.c_[np.unravel_index(np.argpartition(a.ravel(),-5)[-5:],a.shape)]
Out[43]:
array([[4, 1],
[3, 1],
[4, 0],
[0, 0],
[3, 0]])
To explain, let's break it down into single process steps -
# Get partitioned indices such that the last 5 indices refer to the top 5
# values taken globally from the input array. Refer to docs for more info
# Note that it's global because we will flatten it.
In [9]: np.argpartition(a.ravel(),-5)
Out[9]:
array([14, 24, 2, 3, 4, 23, 22, 7, 8, 9, 19, 18, 17, 13, 12, 11, 6,
1, 5, 10, 21, 16, 20, 0, 15])
# Get last 5 indices, which are the top 5 valued indices
In [10]: np.argpartition(a.ravel(),-5)[-5:]
Out[10]: array([21, 16, 20, 0, 15])
# Convert the global indices back to row-col format
In [11]: np.unravel_index(np.argpartition(a.ravel(),-5)[-5:],a.shape)
Out[11]: (array([4, 3, 4, 0, 3]), array([1, 1, 0, 0, 0]))
# Stack into two-columnar array
In [12]: np.c_[np.unravel_index(np.argpartition(a.ravel(),-5)[-5:],a.shape)]
Out[12]:
array([[4, 1],
[3, 1],
[4, 0],
[0, 0],
[3, 0]])
For matrix data in a
, it would be -
In [48]: np.dstack(np.unravel_index(np.argpartition(a.ravel(),-5)[:,-5:],a.shape))
Out[48]:
array([[[4, 1],
[3, 1],
[4, 0],
[0, 0],
[3, 0]]])
So, compared to the array, the only difference is with the usage of np.dstack
, because with matrix data, the data always stays as 2D.
Notice that these are your results from the last 5
rows.
Your sample:
n = np.array([[0.17542851, 0.13199346, 0.01579704, 0.01429822, 0.01302919],
[0.13279703, 0.12444886, 0.04742024, 0.03114371, 0.02623729],
[0.13502306, 0.07815065, 0.07291175, 0.03690815, 0.02163695],
[0.19032505, 0.15853737, 0.05889324, 0.02791679, 0.02699252],
[0.1695696 , 0.14538635, 0.07127667, 0.04997876, 0.02580234]])
Your output is not top 5 values' indice. Top 5 values are
array([0.14538635, 0.15853737, 0.1695696 , 0.17542851, 0.19032505])
To get their indices: sort
and using isin
to flag their location True
. Finally, use argwhere
to get their posistion
np.argwhere(np.isin(n, np.sort(n, axis=None)[-5:]))
Out[324]:
array([[0, 0],
[3, 0],
[3, 1],
[4, 0],
[4, 1]], dtype=int32)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With