I have some stocks data in a dataframe that I'm resampling, which results in some NaN values. Here's a section of the raw feed:
In [34]: feeddf
Out[34]:
open high low close volume
date
2017-12-03 07:00:00 14.46 14.46 14.46 14.46 25000
2017-12-03 07:01:00 14.46 14.46 14.46 14.46 20917
2017-12-03 07:06:00 14.50 14.50 14.50 14.50 2000
2017-12-03 07:12:00 14.50 14.56 14.50 14.56 17000
The feed is supposed to be minute-by-minute, but when there's not data available, the row is skipped. When resampling the dataframe and aggregating for the opens, highs, lows, and closes, it looks like this:
In [35]: feeddf.resample('3Min').agg({'open': 'first',
'high': 'max',
'low': 'min',
'close': 'last'})
Out[35]:
open high low close
date
2017-12-03 07:00:00 14.46 14.46 14.46 14.46
2017-12-03 07:03:00 NaN NaN NaN NaN
2017-12-03 07:06:00 14.50 14.50 14.50 14.50
2017-12-03 07:09:00 NaN NaN NaN NaN
2017-12-03 07:12:00 14.50 14.56 14.50 14.56
My question: I want to forward-fill the missing data based on the last row's close
value. df.fillna(method='ffill')
is not helping because it fills it based on the last value on the same column. Any idea?
First forward fill last column close
and then bfill
by columns:
print (df)
open high low close
date
2017-12-03 07:00:00 14.46 14.46 14.46 14.81
2017-12-03 07:03:00 NaN NaN NaN NaN
2017-12-03 07:06:00 14.50 14.50 14.50 14.59
2017-12-03 07:09:00 NaN NaN NaN NaN
2017-12-03 07:12:00 14.50 14.56 14.50 14.56
df['close'] = df['close'].ffill()
df = df.bfill(axis=1)
print (df)
open high low close
date
2017-12-03 07:00:00 14.46 14.46 14.46 14.81
2017-12-03 07:03:00 14.81 14.81 14.81 14.81
2017-12-03 07:06:00 14.50 14.50 14.50 14.59
2017-12-03 07:09:00 14.59 14.59 14.59 14.59
2017-12-03 07:12:00 14.50 14.56 14.50 14.56
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With