Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas dataframe insert missing row and fill with previous row

I have a dataframe as below:

import pandas as pd
import numpy as np

df=pd.DataFrame({'id':[0,1,2,4,5],
                'A':[0,1,0,1,0],
                'B':[None,None,1,None,None]})
   id  A    B
0   0  0  NaN
1   1  1  NaN
2   2  0  1.0
3   4  1  NaN
4   5  0  NaN

Notice that the vast majority of value in B column is NaN

id column increment by 1,so one row between id 2 and 4 is missing.
The missing row which need insert is the same as the previous row, except for id column.

So for example the result is

    id  A   B
0   0   0.0 NaN
1   1   1.0 NaN
2   2   0.0 1.0
3   3   0.0 1.0 <-add row here
4   4   1.0 NaN
5   5   0.0 NaN

I can do this on A column,but I don't know how to deal with B column as ffill will fill 1.0 at row 4 and 5,which is incorrect

step=1
idx=np.arange(df['id'].min(), df['id'].max() + step, step)
df=df.set_index('id').reindex(idx).reset_index()
df['A']=df["A"].ffill()

EDIT:
sorry,I forget one sutiation.
B column will have different values.
When DataFrame is as below:

   id  A    B
0   0  0  NaN
1   1  1  NaN
2   2  0  1.0
3   4  1  NaN
4   5  0  NaN
5   6  1  2.0
6   9  0  NaN
7   10 1  NaN

the result would be:

   id  A    B
0   0  0  NaN
1   1  1  NaN
2   2  0  1.0
3   3  0  1.0
4   4  1  NaN
5   5  0  NaN
6   6  1  2.0
7   7  1  2.0
8   8  1  2.0
9   9  0  NaN
10  10 1  NaN
like image 847
atiAkizuki Avatar asked Oct 30 '25 11:10

atiAkizuki


2 Answers

Do the changes keep the original id , and with update isin

s=df.id.copy() #change 1
step=1
idx=np.arange(df['id'].min(), df['id'].max() + step, step)
df=df.set_index('id').reindex(idx).reset_index()
df['A']=df["A"].ffill()

df.B.update(df.B.ffill().mask(df.id.isin(s))) # change two
df
   id    A    B
0   0  0.0  NaN
1   1  1.0  NaN
2   2  0.0  1.0
3   3  0.0  1.0
4   4  1.0  NaN
5   5  0.0  NaN
like image 157
BENY Avatar answered Nov 02 '25 01:11

BENY


If I understand in the right way, here are some sample code.

new_df = pd.DataFrame({
    'new_id': [i for i in range(df['id'].max() + 1)],
})

df = df.merge(new_df, how='outer', left_on='id', right_on='new_id')
df = df.sort_values('new_id')

df = df.ffill()

df = df.drop(columns='id')

df
    A   B   new_id
0   0.0 NaN 0
1   1.0 NaN 1
2   0.0 1.0 2
5   0.0 1.0 3
3   1.0 1.0 4
4   0.0 1.0 5
like image 25
Andrew Li Avatar answered Nov 02 '25 03:11

Andrew Li



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!