This is my dataframe:
import pandas as pd
df = pd.DataFrame(
{
'a': [100, 1123, 123, 100, 1, 0, 1],
'b': [1000, 11123, 1123, 0, 55, 0, 1],
'c': ['a', 'b', 'c', 'd', 'e', 'f', 'g'],
}
)
And this is the output that I want. I want to create column x:
a b c x
0 100 1000 a NaN
1 1123 11123 b NaN
2 123 1123 c NaN
3 100 0 d NaN
4 1 55 e e
5 0 0 f NaN
6 1 1 g NaN
By using a mask:
mask = (
(df.a > df.b)
)
First of all I need to find the first occurrence of this mask which in my example is row number 3. Then I want to move one row below it and use the value in column c to create column x.
So in my example, the first occurrence of mask is row 3. One row after it is row 4. That is why e is selected for column x.
Note that in row 4 which is one row after the mask, no condition is needed. For example for row 4, It is NOT necessary that df.a > df.b.
This is what I have tried:
df.loc[mask.cumsum().eq(1) & mask, 'x'] = df.c.shift(-1)
I provide some additional dfs for convenience to test whether the code works in other examples. For instance what if there are no cases that meet the conditions of mask. In that case I just want a column of NaN for x.
df = pd.DataFrame({'a': [1000, 11230, 12300, 10000, 1000, 10000, 100000], 'b': [1000, 11123, 1123, 0, 55, 0, 1], 'c': ['a', 'b', 'c', 'd', 'e', 'f', 'g']})
df = pd.DataFrame({'a': [1, 1, 1, -1, -1, -1, -1], 'b': [1000, 11123, 1123, 0, 55, 0, 1], 'c': ['a', 'b', 'c', 'd', 'e', 'f', 'g']})
df = pd.DataFrame({'a': [-1, -1, -1, -1, -1, -1, 100000], 'b': [1000, 11123, 1123, 0, 55, 0, 1], 'c': ['a', 'b', 'c', 'd', 'e', 'f', 'g']})
You can generate a mask that indicates one location past where the first value of a is greater than b:
mask = (df.a > df.b).shift(fill_value=False)
mask = mask & ~mask.cumsum().shift().astype(bool)
You can then use that mask to set the value of x equal to c:
df.loc[mask, 'x'] = df['c']
Output for each of your dfs:
a b c x
0 100 1000 a NaN
1 1123 11123 b NaN
2 123 1123 c NaN
3 100 0 d NaN
4 1 55 e e
5 0 0 f NaN
6 1 1 g NaN
a b c x
0 1000 1000 a NaN
1 11230 11123 b NaN
2 12300 1123 c c
3 10000 0 d NaN
4 1000 55 e NaN
5 10000 0 f NaN
6 100000 1 g NaN
a b c x
0 1 1000 a NaN
1 1 11123 b NaN
2 1 1123 c NaN
3 -1 0 d NaN
4 -1 55 e NaN
5 -1 0 f NaN
6 -1 1 g NaN
a b c x
0 -1 1000 a NaN
1 -1 11123 b NaN
2 -1 1123 c NaN
3 -1 0 d NaN
4 -1 55 e NaN
5 -1 0 f NaN
6 100000 1 g NaN
More generically, you can use cummax and shift by N to select the next N values:
N = 3
mask = (df.a > df.b).shift(fill_value=False).cummax()
mask = mask & ~mask.cumsum().shift(N, fill_value=0).astype(bool)
A generic approach could be to cummax+cumsum then select the desired position(s):
N = 3
mask = ((df.a > df.b)
.cummax().cumsum()
.between(2, N+1)
)
df.loc[mask, 'x'] = df.loc[mask, 'c']
Output:
a b c x
0 100 1000 a NaN
1 1123 11123 b NaN
2 123 1123 c NaN
3 100 0 d NaN
4 1 55 e e
5 0 0 f f
6 1 1 g g
Intermediates:
a b c x cumsum mask
0 100 1000 a NaN 0 False
1 1123 11123 b NaN 0 False
2 123 1123 c NaN 0 False
3 100 0 d NaN 1 False
4 1 55 e e 2 True
5 0 0 f f 3 True
6 1 1 g g 4 True
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With