I have a pandas series that looks like
Attribute DateEvent Value
Type A 2015-04-01 4
2015-04-02 5
2015-04-05 3
Type B 2015-04-01 1
2015-04-03 4
2015-04-05 1
How do I convert the values to a rolling sum (say, past two days) while making sure to account for missing dates in my DateEvent index (assuming the start date and end date of it are the full range? (For example, 2015-04-03
and 2015-04-04
are missing for Type A, and 2015-04-02
and 2015-04-04
are missing for Type B).
I've made a couple assumptions about what you want, please clarify:
Value = NaN
.NaN
anytime there's a missing date in the rolling window.Type A
and Type B
If I've assumed correctly,
import pandas as pd
import numpy as np
import io
datastring = io.StringIO(
"""
Attribute,DateEvent,Value
Type A,2017-04-02,1
Type A,2017-04-03,2
Type A,2017-04-04,3
Type A,2017-04-05,4
Type B,2017-04-02,1
Type B,2017-04-03,2
Type B,2017-04-04,3
Type B,2017-04-05,4
""")
s = pd.read_csv(
datastring,
index_col=['Attribute', 'DateEvent'],
parse_dates=True)
print(s)
Here's what it looks like. Each of Type A
and Type B
are missing 2017-04-01
.
Value
Attribute DateEvent
Type A 2017-04-02 1
2017-04-03 2
2017-04-04 3
2017-04-05 4
Type B 2017-04-02 1
2017-04-03 2
2017-04-04 3
2017-04-05 4
According to this answer, you have to reconstruct the index, then re-index your Series
to get one that includes all the dates.
# reconstruct index with all the dates
dates = pd.date_range("2017-04-01","2017-04-05", freq="1D")
attributes = ["Type A", "Type B"]
# create a new MultiIndex
index = pd.MultiIndex.from_product([attributes,dates],
names=["Attribute","DateEvent"])
# reindex the series
sNew = s.reindex(index)
The missing dates were added, with Value = NaN
.
Value
Attribute DateEvent
Type A 2017-04-01 NaN
2017-04-02 1.0
2017-04-03 2.0
2017-04-04 3.0
2017-04-05 4.0
Type B 2017-04-01 NaN
2017-04-02 1.0
2017-04-03 2.0
2017-04-04 3.0
2017-04-05 4.0
Now group the Series
by the Attribute
index column and apply a rolling window of size 2
with sum()
# group the series by the `Attribute` column
grouped = sNew.groupby(level="Attribute")
# Apply a 2 day rolling window
summed = grouped.rolling(2).sum()
Value
Attribute Attribute DateEvent
Type A Type A 2017-04-01 NaN
2017-04-02 NaN
2017-04-03 3.0
2017-04-04 5.0
2017-04-05 7.0
Type B Type B 2017-04-01 NaN
2017-04-02 NaN
2017-04-03 3.0
2017-04-04 5.0
2017-04-05 7.0
Final Note: No idea why there are now two Attribute
index columns, let me know if anyone figures that out.
EDIT: Turns out similar question was asked here. Check it out.
Source: How to fill in missing values with a multiIndex
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With