Pandas: reindex with dates in groupby, filling/maintaining values as appropriate

Question

I have the following DataFrame.

>>> df = pd.DataFrame(data={'date': ['2010-05-01', '2010-07-01', '2010-06-01', '2010-10-01'], 'id': [1,1,2,2], 'val': [50,60,70,80], 'other': ['uno', 'uno', 'dos', 'dos']})
>>> df['date'] = df['date'].apply(lambda d: pd.to_datetime(d))
>>> df
        date  id other  val
0 2010-05-01   1   uno   50
1 2010-07-01   1   uno   60
2 2010-06-01   2   dos   70
3 2010-10-01   2   dos   80

I want to expand this DataFrame so that it contains rows for all months in 2010.

The DataFrame is grouped by id, so we would have 12 rows for each id. In this case, total of 24 rows.
The val at each month, if absent from the initial DataFrame, should be 0.
The other has a 1-to-1 relationship with the id, so I would like to maintain it that way.

My desired result is the following:

         date  id other  val
0  2010-01-01   1   uno    0
1  2010-02-01   1   uno    0
2  2010-03-01   1   uno    0
3  2010-04-01   1   uno    0
4  2010-05-01   1   uno    50
5  2010-06-01   1   uno    0
6  2010-07-01   1   uno    60
7  2010-08-01   1   uno    0
8  2010-09-01   1   uno    0
9  2010-10-01   1   uno    0
10 2010-11-01   1   uno    0
11 2010-12-01   1   uno    0
12 2010-01-01   2   dos    0
13 2010-02-01   2   dos    0
14 2010-03-01   2   dos    0
15 2010-04-01   2   dos    0
16 2010-05-01   2   dos    0
17 2010-06-01   2   dos    70
18 2010-07-01   2   dos    0
19 2010-08-01   2   dos    0
20 2010-09-01   2   dos    0
21 2010-10-01   2   dos    80
22 2010-11-01   2   dos    0
23 2010-12-01   2   dos    0

What I have tried:

I have tried to groupby('id'), then apply. The applied function reindexes the group. But I haven't managed to both fill the val with zeroes, and maintain other.

jezrael · Accepted Answer

You can use groupby by custom function with reindex and filling NaNs - in other by ffill and bfill (forward and back filling) and in val by fillna by constant:

def f(x):
    x = x.reindex(pd.date_range('2010-01-01', '2010-12-01', freq='MS'))
    x['other'] = x['other'].ffill().bfill()
    x['val'] = x['val'].fillna(0)
    return (x)


df = df.set_index('date')
       .groupby('id')
       .apply(f).rename_axis(('id','date'))
       .drop('id', 1).reset_index()

print (df)
    id       date other   val
0    1 2010-01-01   uno   0.0
1    1 2010-02-01   uno   0.0
2    1 2010-03-01   uno   0.0
3    1 2010-04-01   uno   0.0
4    1 2010-05-01   uno  50.0
5    1 2010-06-01   uno   0.0
6    1 2010-07-01   uno  60.0
7    1 2010-08-01   uno   0.0
8    1 2010-09-01   uno   0.0
9    1 2010-10-01   uno   0.0
10   1 2010-11-01   uno   0.0
11   1 2010-12-01   uno   0.0
12   2 2010-01-01   dos   0.0
13   2 2010-02-01   dos   0.0
14   2 2010-03-01   dos   0.0
15   2 2010-04-01   dos   0.0
16   2 2010-05-01   dos   0.0
17   2 2010-06-01   dos  70.0
18   2 2010-07-01   dos   0.0
19   2 2010-08-01   dos   0.0
20   2 2010-09-01   dos   0.0
21   2 2010-10-01   dos  80.0
22   2 2010-11-01   dos   0.0
23   2 2010-12-01   dos   0.0

Another solution is create MultiIndex.from_product and reindex by it:

mux = pd.MultiIndex.from_product([df['id'].unique(),
                                  pd.date_range('2010-01-01', '2010-12-01', freq='MS')], 
                                  names=('id','date'))

df = df.set_index(['id','date']).reindex(mux).reset_index()
df['val'] = df['val'].fillna(0)
df['other'] = df.groupby('id')['other'].apply(lambda x: x.ffill().bfill())

print (df)
    id       date other   val
0    1 2010-01-01   uno   0.0
1    1 2010-02-01   uno   0.0
2    1 2010-03-01   uno   0.0
3    1 2010-04-01   uno   0.0
4    1 2010-05-01   uno  50.0
5    1 2010-06-01   uno   0.0
6    1 2010-07-01   uno  60.0
7    1 2010-08-01   uno   0.0
8    1 2010-09-01   uno   0.0
9    1 2010-10-01   uno   0.0
10   1 2010-11-01   uno   0.0
11   1 2010-12-01   uno   0.0
12   2 2010-01-01   dos   0.0
13   2 2010-02-01   dos   0.0
14   2 2010-03-01   dos   0.0
15   2 2010-04-01   dos   0.0
16   2 2010-05-01   dos   0.0
17   2 2010-06-01   dos  70.0
18   2 2010-07-01   dos   0.0
19   2 2010-08-01   dos   0.0
20   2 2010-09-01   dos   0.0
21   2 2010-10-01   dos  80.0
22   2 2010-11-01   dos   0.0
23   2 2010-12-01   dos   0.0

Pandas: reindex with dates in groupby, filling/maintaining values as appropriate

Tags:

python

pandas

pandas-groupby

gberger

1 Answers

jezrael

Recent Activity

Donate For Us

Pandas: reindex with dates in groupby, filling/maintaining values as appropriate

Tags:

python

pandas

pandas-groupby

gberger

1 Answers

jezrael

Related questions

Recent Activity

Donate For Us