I'd like to rolling-window aggregate a dataframe, but it is crucial that my result keeps a column with a unique index, because later I want to join my aggregation back into the original dataframe.
So like
df = pd.DataFrame(
{"id": range(6),
"t": [pd.Timestamp("2019-01-01")+dt.timedelta(seconds=sec) for sec in [2, 2, 1, 1, 1, 1]],
"gr": list("ababab"),
"val": range(6)})
agg = df.groupby("gr").rolling("2s", on="t")["val"].sum() # <- id not present anymore
Now I'd like to join agg back into df on id but unfortunately Pandas rolling made t the level of the multi-index. (Time t is actually not unique)
Is there any way I can get id in agg ?
Looking at
>>> df.sort_values(["gr", "t"])
id t gr val
2 3 2019-01-01 00:00:01 a 2
4 1 2019-01-01 00:00:01 a 4
0 5 2019-01-01 00:00:02 a 0
3 2 2019-01-01 00:00:01 b 3
5 0 2019-01-01 00:00:01 b 5
1 4 2019-01-01 00:00:02 b 1
the expected sum should be
id t gr sum_val
2 3 2019-01-01 00:00:01 a 2
4 1 2019-01-01 00:00:01 a 6
0 5 2019-01-01 00:00:02 a 6
3 2 2019-01-01 00:00:01 b 3
5 0 2019-01-01 00:00:01 b 8
1 4 2019-01-01 00:00:02 b 9
However, the output of one suggested solution is
agg = df.sort_values("t").groupby(['gr']).rolling("2s", on="t")['val'].sum().reset_index(name='sum_val')
agg['id'] = df.sort_values(['gr'])['id'].values
agg.sort_values(["gr", "t"])
Output:
gr t sum_val id
0 a 2019-01-01 00:00:01 2.0 5
1 a 2019-01-01 00:00:01 6.0 3
2 a 2019-01-01 00:00:02 6.0 1
3 b 2019-01-01 00:00:01 3.0 4
4 b 2019-01-01 00:00:01 8.0 2
5 b 2019-01-01 00:00:02 9.0 0
gr='a', id=5 should be 6 though?!
UPDATE: I made the time more duplicate to demonstrate the problem.
UPDATE: Made t a time column because before it wasnt doing what it was supposed to do.
Do you mean:
df['roll'] = (df.groupby("gr", as_index=False, group_keys=False)
.apply(lambda x: x.rolling(2, on='t')['val'].sum())
)
Output:
id t gr val roll
0 0 1 a 0 NaN
1 1 1 b 1 NaN
2 2 1 a 2 2.0
3 3 1 b 3 4.0
4 4 2 a 4 6.0
5 5 2 b 5 8.0
Since t is non unique within a group but id is unique and you are grouping on gr, you can do:
agg = df.groupby(['gr']).rolling(2, on="t")['val'].sum().reset_index(name='sum_val')
agg['id'] = df.sort_values(['gr'])['id'].values
Output:
gr t sum_val id
0 a 1 NaN 0
1 a 1 2.0 2
2 a 2 6.0 4
3 b 1 NaN 1
4 b 1 4.0 3
5 b 2 8.0 5
If t is unique, you can do:
You can merge back with original df
df.groupby(['gr']).rolling(2, on="t")['val'].sum().reset_index(name='sum_val').merge(df[['id', 't', 'gr']])
Output:
gr t sum_val id
0 a 1 NaN 0
1 a 2 2.0 2
2 a 3 6.0 4
3 b 1 NaN 1
4 b 2 4.0 3
5 b 3 8.0 5
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With