Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas: Create column which contains 'next' changed value in another column

I would like to create column C from column B without a for loop...

dataframe:

# |  A  |  B |  C  
--+-----+----+-----
1 |  2  |  3 |  4
2 |  3  |  3 |  4
3 |  4  |  4 |  6
4 |  5  |  4 |  6
5 |  5  |  4 |  6
6 |  3  |  6 |  2
7 |  2  |  6 |  2
8 |  4  |  2 |  3  #< --- loop back around if possible (B value at index 1)

Essentially I want to get the value of the next change in B and set it to a new column C.

So far with the answer from : Determining when a column value changes in pandas dataframe I have:

df_filtered = df[df['B'].diff() != 0]

But after that I'm not sure how to create C without using a loop...

EDIT: @(Ayoub ZAROU)'s answer answers my original question, however, I noticed my example dataframe doesn't cover all cases if we are assuming a loop in the data:

# |  A  |  B |  C  
--+-----+----+-----
1 |  2  |  3 |  4
2 |  3  |  3 |  4
3 |  4  |  4 |  6
4 |  5  |  4 |  6
5 |  5  |  4 |  6
6 |  3  |  6 |  2
7 |  2  |  6 |  2
8 |  4  |  2 |  3
9 |  3  |  3 |  4
10|  2  |  3 |  4

In this case, if the last segment of 3's is considered to be part of the first segment of 3's, the last two values in C will be incorrect using this solution.

An easy fix however is to move the last few elements to the beginning of the list or vice versa

like image 661
Kyle Avatar asked Oct 16 '25 02:10

Kyle


2 Answers

you could try, note that np.roll is the same as shift in pandas, the only difference is that it allows you to roll the values over, In the following, c gives you the indexes where there is no change

c = (df.B.diff(-1) == 0)

c
Out[104]: 
0     True
1    False
2     True
3     True
4    False
5     True
6    False
7    False
Name: B, dtype: bool

we set then the values there to the next value on the B column yieldied using np.roll and set using pandas.Series.where, note that where changes the values where the change column c is not True,

df['C'] = np.nan
df['C'] = df.C.where(c, np.roll(df.B, -1))
df.C

Out[107]: 
0    NaN
1    4.0
2    NaN
3    NaN
4    6.0
5    NaN
6    2.0
7    3.0
Name: C, dtype: float64

we then fill the remaining rows using bfill on pandas and cast it it the B ' column dtype, So , in global, you do

c = (df.B.diff(-1) == 0)
df['C'] = np.nan
df['C'] = df.C.where(c, np.roll(df.B, -1)).bfill().astype(df.B.dtype)

df.C
Out[110]: 
0    4
1    4
2    6
3    6
4    6
5    2
6    2
7    3
Name: C, dtype: int32
like image 174
Ayoub ZAROU Avatar answered Oct 18 '25 17:10

Ayoub ZAROU


Another way is to get the value changes:

In [11]: changes = (df.B != df.B.shift()).cumsum()

In [12]: changes
Out[12]:
0    1
1    1
2    2
3    2
4    2
5    3
6    3
7    4
Name: B, dtype: int64

and a lookup map:

In [13]: lookup = df.B[(df.B != df.B.shift())]

In [14]: lookup.at[len(lookup)] = df.B.iloc[0]

In [15]: lookup
Out[15]:
0    3
2    4
5    6
7    2
4    3
Name: B, dtype: int64

Then use these to lookup the "next":

In [16]: lookup.iloc[changes]
Out[16]:
2    4
2    4
5    6
5    6
5    6
7    2
7    2
4    3
Name: B, dtype: int64

To create the column you need to ignore the duplicates in the index:

In [17]: df["C"] = lookup.iloc[changes].values
like image 39
Andy Hayden Avatar answered Oct 18 '25 17:10

Andy Hayden



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!