Backfilling columns by groups in Pandas

Question

I have a csv like

A,B,C,D
1,2,,
1,2,30,100
1,2,40,100
4,5,,
4,5,60,200
4,5,70,200
8,9,,

In row 1 and row 4 C value is missing (NaN). I want to take their value from row 2 and 5 respectively. (First occurrence of same A,B value).

If no matching row is found, just put 0 (like in last line) Expected op:

A,B,C,D
1,2,30,
1,2,30,100
1,2,40,100
4,5,60,
4,5,60,200
4,5,70,200
8,9,0,

using fillna I found bfill: use NEXT valid observation to fill gap but the NEXT observation has to be taken logically (looking at col A,B values) and not just the upcoming C column value

cs95 · Accepted Answer

You'll have to call df.groupby on A and B first and then apply the bfill function:

In [501]: df.C = df.groupby(['A', 'B']).apply(lambda x: x.C.bfill()).reset_index(drop=True)

In [502]: df
Out[502]: 
   A  B   C      D
0  1  2  30    NaN
1  1  2  30  100.0
2  1  2  40  100.0
3  4  5  60    NaN
4  4  5  60  200.0
5  4  5  70  200.0
6  8  9   0    NaN

You can also group and then call dfGroupBy.bfill directly (I think this would be faster):

In [508]: df.C = df.groupby(['A', 'B']).C.bfill().fillna(0).astype(int); df
Out[508]: 
   A  B   C      D
0  1  2  30    NaN
1  1  2  30  100.0
2  1  2  40  100.0
3  4  5  60    NaN
4  4  5  60  200.0
5  4  5  70  200.0
6  8  9   0    NaN

If you wish to get rid of NaNs in D, you could do:

df.D.fillna('', inplace=True)

Backfilling columns by groups in Pandas

Tags:

python

pandas

dataframe

pythonRcpp

1 Answers

cs95

Recent Activity

Donate For Us

Backfilling columns by groups in Pandas

Tags:

python

pandas

dataframe

pythonRcpp

1 Answers

cs95

Related questions

Recent Activity

Donate For Us