Interpolation still leaving NaN's (pandas groupby)

Question

I have a Dataframe with the location of some customers (so I have a column with Customer_id and others with Lat and Lon) and I am trying to interpolate the NaN's according to each customer.

For example, if I interpolate with the nearest approach here (I made up the values here):

 Customer_id   Lat    Lon
   A            1      1
   A            NaN    NaN  
   A            2      2      
   B            NaN    NaN
   B            4      4

I would like the NaN for B to be 4 and not 2.

I have tried this

series.groupby('Customer_id').apply(lambda group: group.interpolate(method = 'nearest', limit_direction = 'both'))

And the number of NaN's goes down from 9003 to 94. But I'm not understanding why it is still leaving some missing values.

I checked and these 94 missing values corresponded to records from customers that were already being interpolated. For example,

    Customer_id   Lat
 0.    A           1
 1.    A           NaN
 2.    A           NaN
 3.    A           NaN
 4.    A           NaN

It would interpolate correctly until some value (let's say it interpolates 1, 2 and 3 correctly) and then leaves 4 as NaN.

I have tried to set a limit in interpolate greater than the maximum number of records per client but it is still not working out. I don't know where my mistake is, can somebody help out?

(I don't know if it's relevant to mention or not but I fabricated my own NaN's for this. This is the code I used Replace some values in a dataframe with NaN's if the index of the row does not exist in another dataframe I think the problem isn't here but since I'm very confused as to where the issue actually is I'll just leave it here)

ALollz · Accepted Answer

When you interpolate with nearest it is only able to fill in-between missing values. (You'll notice this because you get an error when there's only 1 non-null value, like in your example). The remaining null values are "edges" which are taken care of with .bfill().ffill() for the nearest logic. This is also the appropriate logic to "interpolate" with only one non-missing value.

def my_interp(x):
    if x.notnull().sum() > 1:
        return x.interpolate(method='nearest').ffill().bfill()
    else:
        return x.ffill().bfill()

df.groupby('Customer_id').transform(my_interp)

#   Lat  Lon
#0  1.0  1.0
#1  1.0  1.0
#2  2.0  2.0
#3  4.0  4.0
#4  4.0  4.0

Interpolation still leaving NaN's (pandas groupby)

Tags:

python

python-3.x

pandas

dataframe

interpolation

Nocas

1 Answers

ALollz

Recent Activity

Donate For Us

Interpolation still leaving NaN's (pandas groupby)

Tags:

python

python-3.x

pandas

dataframe

interpolation

Nocas

1 Answers

ALollz

Related questions

Recent Activity

Donate For Us