This is my DataFrame:
import pandas as pd
df = pd.DataFrame(
{
'a': [100, 1123, 123, 100, 1, 0, 1],
'b': [1000, 11123, 1123, 0, 55, 0, 1],
},
index=range(100, 107)
)
And this is the expected output. I want to create column c:
a b c
100 100 1000 NaN
101 1123 11123 NaN
102 123 1123 NaN
103 100 0 3.0
104 1 55 NaN
105 0 0 NaN
106 1 1 NaN
The mask that is used is:
mask = ((df.a > df.b))
I want to get the index of first row that mask occurs. I want to preserve the original index but get the reset_index() value. In this example the first instance of the mask is at index 3.
I can get the first instance of the mask by this:
df.loc[mask.cumsum().eq(1) & mask, 'c'] = 'the first row'
But I don't know how to get the index.
The code below assesses each row of the dataframe using .apply(), and when the condition a > b is met, it returns the linear index of the row. The results are written to a new column 'c'.
df['c'] = df.apply(
lambda row: df.index.get_loc(row.name) if row.a > row.b else np.NaN,
axis=1
)
Result:
a b c
100 100 1000 NaN
101 1123 11123 NaN
102 123 1123 NaN
103 100 0 3.0
104 1 55 NaN
105 0 0 NaN
106 1 1 NaN
Code
This code can be modified to search for the second and third items as well, not only first.
cond1 = df['a'] > df['b']
cond2 = df.groupby(cond1).cumcount().eq(0)
df.loc[cond1 & cond2, 'c'] = 'the first row'
df:
a b c
100 100 1000 NaN
101 1123 11123 NaN
102 123 1123 NaN
103 100 0 the first row
104 1 55 NaN
105 0 0 NaN
106 1 1 NaN
If you are only looking for the first value, the following code may be simpler:
df.loc[df['a'].gt(df['b']).cummax().cumsum().eq(1), 'c'] = 'the first row'
Updete Answer
if you want only location of index, use following code:
cond1 = df['a'] > df['b']
idx = cond1.idxmax()
loc = df.index.get_loc(idx)
loc:
3
df.loc[df.index == idx, 'c'] = loc
df:
a b c
100 100 1000 NaN
101 1123 11123 NaN
102 123 1123 NaN
103 100 0 3
104 1 55 NaN
105 0 0 NaN
106 1 1 NaN
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With