This is my DataFrame:
import pandas as pd
df = pd.DataFrame(
    {
        'a': [100, 1123, 123, 100, 1, 0, 1],
        'b': [1000, 11123, 1123, 0, 55, 0, 1],
    },
    index=range(100, 107)
)
And this is the expected output. I want to create column c:
       a      b      c
100   100   1000    NaN
101  1123  11123    NaN
102   123   1123    NaN
103   100      0    3.0
104     1     55    NaN
105     0      0    NaN
106     1      1    NaN
The mask that is used is:
mask = ((df.a > df.b))
I want to get the index of first row that mask occurs. I want to preserve the original index but get the reset_index() value. In this example the first instance of the mask is at index 3.
I can get the first instance of the mask by this:
df.loc[mask.cumsum().eq(1) & mask, 'c'] = 'the first row'
But I don't know how to get the index.
The code below assesses each row of the dataframe using .apply(), and when the condition a > b is met, it returns the linear index of the row. The results are written to a new column 'c'.
df['c'] = df.apply(
    lambda row: df.index.get_loc(row.name) if row.a > row.b else np.NaN,
    axis=1
)
Result:
    a    b     c
100 100  1000  NaN  
101 1123 11123 NaN  
102 123  1123  NaN  
103 100  0     3.0
104 1    55    NaN
105 0    0     NaN
106 1    1     NaN
                        Code
This code can be modified to search for the second and third items as well, not only first.
cond1 = df['a'] > df['b']
cond2 = df.groupby(cond1).cumcount().eq(0)
df.loc[cond1 & cond2, 'c'] = 'the first row'
df:
        a      b              c
100   100   1000            NaN
101  1123  11123            NaN
102   123   1123            NaN
103   100      0  the first row
104     1     55            NaN
105     0      0            NaN
106     1      1            NaN
If you are only looking for the first value, the following code may be simpler:
df.loc[df['a'].gt(df['b']).cummax().cumsum().eq(1), 'c'] = 'the first row'
Updete Answer
if you want only location of index, use following code:
cond1 = df['a'] > df['b']
idx = cond1.idxmax()
loc = df.index.get_loc(idx)
loc:
3
df.loc[df.index == idx, 'c'] = loc
df:
        a      b    c
100   100   1000  NaN
101  1123  11123  NaN
102   123   1123  NaN
103   100      0    3
104     1     55  NaN
105     0      0  NaN
106     1      1  NaN
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With