I have the following DataFrame.
>>> df = pd.DataFrame(data={'date': ['2010-05-01', '2010-07-01', '2010-06-01', '2010-10-01'], 'id': [1,1,2,2], 'val': [50,60,70,80], 'other': ['uno', 'uno', 'dos', 'dos']})
>>> df['date'] = df['date'].apply(lambda d: pd.to_datetime(d))
>>> df
date id other val
0 2010-05-01 1 uno 50
1 2010-07-01 1 uno 60
2 2010-06-01 2 dos 70
3 2010-10-01 2 dos 80
I want to expand this DataFrame so that it contains rows for all months in 2010.
id, so we would have 12 rows for each id. In this case, total of 24 rows.val at each month, if absent from the initial DataFrame, should be 0.other has a 1-to-1 relationship with the id, so I would like to maintain it that way.My desired result is the following:
date id other val
0 2010-01-01 1 uno 0
1 2010-02-01 1 uno 0
2 2010-03-01 1 uno 0
3 2010-04-01 1 uno 0
4 2010-05-01 1 uno 50
5 2010-06-01 1 uno 0
6 2010-07-01 1 uno 60
7 2010-08-01 1 uno 0
8 2010-09-01 1 uno 0
9 2010-10-01 1 uno 0
10 2010-11-01 1 uno 0
11 2010-12-01 1 uno 0
12 2010-01-01 2 dos 0
13 2010-02-01 2 dos 0
14 2010-03-01 2 dos 0
15 2010-04-01 2 dos 0
16 2010-05-01 2 dos 0
17 2010-06-01 2 dos 70
18 2010-07-01 2 dos 0
19 2010-08-01 2 dos 0
20 2010-09-01 2 dos 0
21 2010-10-01 2 dos 80
22 2010-11-01 2 dos 0
23 2010-12-01 2 dos 0
What I have tried:
I have tried to groupby('id'), then apply. The applied function reindexes the group. But I haven't managed to both fill the val with zeroes, and maintain other.
You can use groupby by custom function with reindex and filling NaNs - in other by ffill and bfill (forward and back filling) and in val by fillna by constant:
def f(x):
x = x.reindex(pd.date_range('2010-01-01', '2010-12-01', freq='MS'))
x['other'] = x['other'].ffill().bfill()
x['val'] = x['val'].fillna(0)
return (x)
df = df.set_index('date')
.groupby('id')
.apply(f).rename_axis(('id','date'))
.drop('id', 1).reset_index()
print (df)
id date other val
0 1 2010-01-01 uno 0.0
1 1 2010-02-01 uno 0.0
2 1 2010-03-01 uno 0.0
3 1 2010-04-01 uno 0.0
4 1 2010-05-01 uno 50.0
5 1 2010-06-01 uno 0.0
6 1 2010-07-01 uno 60.0
7 1 2010-08-01 uno 0.0
8 1 2010-09-01 uno 0.0
9 1 2010-10-01 uno 0.0
10 1 2010-11-01 uno 0.0
11 1 2010-12-01 uno 0.0
12 2 2010-01-01 dos 0.0
13 2 2010-02-01 dos 0.0
14 2 2010-03-01 dos 0.0
15 2 2010-04-01 dos 0.0
16 2 2010-05-01 dos 0.0
17 2 2010-06-01 dos 70.0
18 2 2010-07-01 dos 0.0
19 2 2010-08-01 dos 0.0
20 2 2010-09-01 dos 0.0
21 2 2010-10-01 dos 80.0
22 2 2010-11-01 dos 0.0
23 2 2010-12-01 dos 0.0
Another solution is create MultiIndex.from_product and reindex by it:
mux = pd.MultiIndex.from_product([df['id'].unique(),
pd.date_range('2010-01-01', '2010-12-01', freq='MS')],
names=('id','date'))
df = df.set_index(['id','date']).reindex(mux).reset_index()
df['val'] = df['val'].fillna(0)
df['other'] = df.groupby('id')['other'].apply(lambda x: x.ffill().bfill())
print (df)
id date other val
0 1 2010-01-01 uno 0.0
1 1 2010-02-01 uno 0.0
2 1 2010-03-01 uno 0.0
3 1 2010-04-01 uno 0.0
4 1 2010-05-01 uno 50.0
5 1 2010-06-01 uno 0.0
6 1 2010-07-01 uno 60.0
7 1 2010-08-01 uno 0.0
8 1 2010-09-01 uno 0.0
9 1 2010-10-01 uno 0.0
10 1 2010-11-01 uno 0.0
11 1 2010-12-01 uno 0.0
12 2 2010-01-01 dos 0.0
13 2 2010-02-01 dos 0.0
14 2 2010-03-01 dos 0.0
15 2 2010-04-01 dos 0.0
16 2 2010-05-01 dos 0.0
17 2 2010-06-01 dos 70.0
18 2 2010-07-01 dos 0.0
19 2 2010-08-01 dos 0.0
20 2 2010-09-01 dos 0.0
21 2 2010-10-01 dos 80.0
22 2 2010-11-01 dos 0.0
23 2 2010-12-01 dos 0.0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With