I'm looking to take a pandas series and fill NaN with the average of the next numerical value where: average = next numerical value / (# consecutive NaNs + 1)
Here's my code so far, I just can't figure out how to divide the filler column among the NaNs (and the next numerical value as well) in num:
import pandas as pd
dates = pd.date_range(start = '1/1/2016',end = '1/12/2016', freq = 'D')
nums = [10, 12, None, None, 39, 10, 11, None, None, None, None, 60]
df = pd.DataFrame({
        'date':dates, 
        'num':nums
        })
df['filler'] = df['num'].fillna(method = 'bfill')
Current Output:
         date   num  filler
0  2016-01-01  10.0    10.0
1  2016-01-02  12.0    12.0
2  2016-01-03   NaN    39.0
3  2016-01-04   NaN    39.0
4  2016-01-05  39.0    39.0
5  2016-01-06  10.0    10.0
6  2016-01-07  11.0    11.0
7  2016-01-08   NaN    60.0
8  2016-01-09   NaN    60.0
9  2016-01-10   NaN    60.0
10 2016-01-11   NaN    60.0
11 2016-01-12  60.0    60.0
Desired Output:
         date   num
0  2016-01-01  10.0
1  2016-01-02  12.0
2  2016-01-03  13.0
3  2016-01-04  13.0
4  2016-01-05  13.0
5  2016-01-06  10.0
6  2016-01-07  11.0
7  2016-01-08  12.0
8  2016-01-09  12.0
9  2016-01-10  12.0
10 2016-01-11  12.0
11 2016-01-12  12.0
cumsum of notnull
groupby and transform with mean
csum = df.num.notnull()[::-1].cumsum()
filler = df.num.fillna(0).groupby(csum).transform('mean')
df.assign(filler=filler)
         date   num  filler
0  2016-01-01  10.0    10.0
1  2016-01-02  12.0    12.0
2  2016-01-03   NaN    13.0
3  2016-01-04   NaN    13.0
4  2016-01-05  39.0    13.0
5  2016-01-06  10.0    10.0
6  2016-01-07  11.0    11.0
7  2016-01-08   NaN    12.0
8  2016-01-09   NaN    12.0
9  2016-01-10   NaN    12.0
10 2016-01-11   NaN    12.0
11 2016-01-12  60.0    12.0
how it works
df.num.notnull().cumsum() is a standard technique to find groups of contiguous nulls.  However, I wanted my groups to end with the next numeric value.  So I reversed the series and then cumsum'd.transform to broadcast across the existing indexassign new column.  Despite having reversed the series, the index will realign like magic.  Could have used loc but that overwrites the existing df.  I'll let OP decide to overwrite if they want to.If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With