Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get the index of the first row that meets the conditions of a mask?

This is my DataFrame:

import pandas as pd

df = pd.DataFrame(
    {
        'a': [100, 1123, 123, 100, 1, 0, 1],
        'b': [1000, 11123, 1123, 0, 55, 0, 1],
    },
    index=range(100, 107)
)

And this is the expected output. I want to create column c:

       a      b      c
100   100   1000    NaN
101  1123  11123    NaN
102   123   1123    NaN
103   100      0    3.0
104     1     55    NaN
105     0      0    NaN
106     1      1    NaN

The mask that is used is:

mask = ((df.a > df.b))

I want to get the index of first row that mask occurs. I want to preserve the original index but get the reset_index() value. In this example the first instance of the mask is at index 3.

I can get the first instance of the mask by this:

df.loc[mask.cumsum().eq(1) & mask, 'c'] = 'the first row'

But I don't know how to get the index.

like image 727
Amir Avatar asked Oct 15 '25 04:10

Amir


2 Answers

The code below assesses each row of the dataframe using .apply(), and when the condition a > b is met, it returns the linear index of the row. The results are written to a new column 'c'.

df['c'] = df.apply(
    lambda row: df.index.get_loc(row.name) if row.a > row.b else np.NaN,
    axis=1
)

Result:

    a    b     c
100 100  1000  NaN  
101 1123 11123 NaN  
102 123  1123  NaN  
103 100  0     3.0
104 1    55    NaN
105 0    0     NaN
106 1    1     NaN

like image 168
MuhammedYunus SaveGaza Avatar answered Oct 17 '25 18:10

MuhammedYunus SaveGaza


Code

This code can be modified to search for the second and third items as well, not only first.

cond1 = df['a'] > df['b']
cond2 = df.groupby(cond1).cumcount().eq(0)
df.loc[cond1 & cond2, 'c'] = 'the first row'

df:

        a      b              c
100   100   1000            NaN
101  1123  11123            NaN
102   123   1123            NaN
103   100      0  the first row
104     1     55            NaN
105     0      0            NaN
106     1      1            NaN

If you are only looking for the first value, the following code may be simpler:

df.loc[df['a'].gt(df['b']).cummax().cumsum().eq(1), 'c'] = 'the first row'

Updete Answer

if you want only location of index, use following code:

cond1 = df['a'] > df['b']
idx = cond1.idxmax()
loc = df.index.get_loc(idx)

loc:

3

df.loc[df.index == idx, 'c'] = loc

df:

        a      b    c
100   100   1000  NaN
101  1123  11123  NaN
102   123   1123  NaN
103   100      0    3
104     1     55  NaN
105     0      0  NaN
106     1      1  NaN
like image 20
Panda Kim Avatar answered Oct 17 '25 18:10

Panda Kim



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!