I am trying to interpolate time series data, df, which looks like:
         id      data        lat      notes    analysis_date
0  17358709       NaN  26.125979      None     2019-09-20 12:00:00+00:00
1  17358709       NaN  26.125979      None     2019-09-20 12:00:00+00:00
2  17352742 -2.331365  26.125979      None     2019-09-20 12:00:00+00:00
3  17358709 -4.424366  26.125979      None     2019-09-20 12:00:00+00:00
I try: df.groupby(['lat', 'lon']).apply(lambda group: group.interpolate(method='linear')), and it throws {ValueError}Invalid fill method. Expecting pad (ffill) or backfill (bfill). Got linear
I suspect the issue is with the fact that I have None values, and I do not want to interpolate those. What is the solution?
df.dtypes gives me:
id                                                                int64
data                                                            float64
lat                                                             float64
notes                                                            object
analysis_date         datetime64[ns, psycopg2.tz.FixedOffsetTimezone...
dtype: object
You can interpolate missing values ( NaN ) in pandas. DataFrame and Series with interpolate() . This article describes the following contents. Use dropna() and fillna() to remove missing values NaN or to fill them with a specific value.
DataFrame.interpolate has issues with timezone-aware datetime64ns columns, which leads to that rather cryptic error message. E.g.
import pandas as pd
df = pd.DataFrame({'time': pd.to_datetime(['2010', '2011', 'foo', '2012', '2013'], 
                                          errors='coerce')})
df['time'] = df.time.dt.tz_localize('UTC').dt.tz_convert('Asia/Kolkata')
df.interpolate()
ValueError: Invalid fill method. Expecting pad (ffill) or backfill (bfill). Got linear
In this case interpolating that column is unnecessary so only interpolate the column you need. We still want DataFrame.interpolate so select with [[ ]] (Series.interpolate leads to some odd reshaping) 
df['data'] = df.groupby(['lat', 'lon']).apply(lambda x: x[['data']].interpolate())
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With