Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas fill NaN base on last available value and the next available value

Tags:

python

pandas

I have csv data looks like this:

     A     B
0   x aa   
1   z aa   
2          
3   
4   x aa   
5   z bb
6   x bb
7          
8   z cc   

I would like to fill the empty cells in B column with values in A, if in A column

last_available_value_before_the_NaNs_in_A.split()[-1] == next_available_value_after_the_NaNs_in_A.split()[-1]

the wanted result would be:

     A     B
0   x aa   aa 
1   z aa   aa
2          aa
3          aa
4   x aa   aa
5   z bb   bb
6   x bb   bb
7          
8   z cc   cc

data.loc(7,'B') will be NaN because data.loc(6,'A').split()[-1] == data.loc(8,'A').split()[-1] is false.

data.loc(5,'B') is 'bb' because data.loc(5,'A').split()[-1] == 'bb'

Thanks for your help!

like image 855
guest_42 Avatar asked Jan 27 '26 02:01

guest_42


1 Answers

You could compare a version that uses ffill and one that uses bfill:

f = df.A.fillna(method='ffill').str.split().str[-1]
b = df.A.fillna(method='bfill').str.split().str[-1]
df.B.where(f != b, f, inplace=True)
like image 180
a_guest Avatar answered Jan 28 '26 17:01

a_guest