I have the following square DataFrame:
In [104]: d
Out[104]:
           a          b          c          d          e
a        inf   5.909091   8.636364   7.272727   4.454545
b   7.222222        inf   8.666667   7.666667   1.777778
c  15.833333  13.000000        inf   9.166667  14.666667
d   4.444444   3.833333   3.055556        inf   4.833333
e  24.500000   8.000000  44.000000  43.500000        inf
this is modified distance matrix, representing pairwise distance between objects ['a','b','c','d','e'], where each row is divided by a coefficient (weight) and all diagonal elements artificially set to np.inf.
How may I get a list/vector of indices like as follows in an efficient (vectorized) way:
d   # index of minimal element in the column `a`
a   # index of minimal element in the column `b` (excluding already found indices: [d]) 
b   # index of minimal element in the column `c` (excluding already found indices: [d,a]) 
c   # index of minimal element in the column `d` (excluding already found indices: [d,a,b]) 
I.e. in the first column we had found index d, so when we search for a minimum in the second column  we are excluding row with index d (found previously in the first column) - this would be a.
When we are looking for the minimum in the third column we are excluding rows with indices found previously (['d','a']) - this would be b.
When we are looking for the minimum in the fourth column we are excluding rows with indices found previously (['d','a','b']) - this would be c.
I don't need diagonal (inf) elements, so the resulting list/vector will contain d.shape[0] - 1 elements.
I.e. the resulting list will look like: ['d','a','b','c'] or in case of Numpy solution the corresponding numerical indices: [3,0,1,2]
It's not a problem to do it using slow for loop solution, but I can't wrap my head around a vectorized (fast) solution...
A loop is the only solution I can see here.
But you can use numpy + numba to optimise.
from numba import jit
@jit(nopython=True)
def get_min_lookback(A, res):
    for i in range(A.shape[1]):
        res[i] = np.argmin(A[:, i])
        A[res[i], :] = np.inf
    return res
arr = df.values
get_min_lookback(arr, np.zeros(arr.shape[1], dtype=int))
# array([3, 0, 1, 2, 0])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With