Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas Date MultiIndex with missing dates - Rolling sum

I have a pandas series that looks like

Attribute      DateEvent     Value
Type A         2015-04-01    4
               2015-04-02    5
               2015-04-05    3
Type B         2015-04-01    1
               2015-04-03    4
               2015-04-05    1

How do I convert the values to a rolling sum (say, past two days) while making sure to account for missing dates in my DateEvent index (assuming the start date and end date of it are the full range? (For example, 2015-04-03 and 2015-04-04 are missing for Type A, and 2015-04-02 and 2015-04-04 are missing for Type B).

like image 954
John Avatar asked Oct 12 '25 09:10

John


1 Answers

I've made a couple assumptions about what you want, please clarify:

  1. You want the rows that have missing dates, to be considered as having Value = NaN.
  2. As a result, the past 2 day rolling sum should return NaN anytime there's a missing date in the rolling window.
  3. You want to compute the rolling sum within in each group Type A and Type B

If I've assumed correctly,

Create sample data set

import pandas as pd
import numpy as np
import io

datastring = io.StringIO(
"""
Attribute,DateEvent,Value
Type A,2017-04-02,1
Type A,2017-04-03,2
Type A,2017-04-04,3
Type A,2017-04-05,4
Type B,2017-04-02,1
Type B,2017-04-03,2
Type B,2017-04-04,3
Type B,2017-04-05,4
""")

s = pd.read_csv(
            datastring, 
            index_col=['Attribute', 'DateEvent'],
            parse_dates=True)
print(s)

Here's what it looks like. Each of Type A and Type B are missing 2017-04-01.

                      Value
Attribute DateEvent        
Type A    2017-04-02      1
          2017-04-03      2
          2017-04-04      3
          2017-04-05      4
Type B    2017-04-02      1
          2017-04-03      2
          2017-04-04      3
          2017-04-05      4

Solution

According to this answer, you have to reconstruct the index, then re-index your Series to get one that includes all the dates.

# reconstruct index with all the dates
dates = pd.date_range("2017-04-01","2017-04-05", freq="1D")
attributes = ["Type A", "Type B"]
# create a new MultiIndex
index = pd.MultiIndex.from_product([attributes,dates], 
        names=["Attribute","DateEvent"])
# reindex the series
sNew = s.reindex(index)

The missing dates were added, with Value = NaN.

                      Value
Attribute DateEvent        
Type A    2017-04-01    NaN
          2017-04-02    1.0
          2017-04-03    2.0
          2017-04-04    3.0
          2017-04-05    4.0
Type B    2017-04-01    NaN
          2017-04-02    1.0
          2017-04-03    2.0
          2017-04-04    3.0
          2017-04-05    4.0

Now group the Series by the Attribute index column and apply a rolling window of size 2 with sum()

# group the series by the `Attribute` column
grouped = sNew.groupby(level="Attribute")
# Apply a 2 day rolling window
summed = grouped.rolling(2).sum()

Final Output

                                Value
Attribute Attribute DateEvent        
Type A    Type A    2017-04-01    NaN
                    2017-04-02    NaN
                    2017-04-03    3.0
                    2017-04-04    5.0
                    2017-04-05    7.0
Type B    Type B    2017-04-01    NaN
                    2017-04-02    NaN
                    2017-04-03    3.0
                    2017-04-04    5.0
                    2017-04-05    7.0

Final Note: No idea why there are now two Attribute index columns, let me know if anyone figures that out.

EDIT: Turns out similar question was asked here. Check it out.

Source: How to fill in missing values with a multiIndex

like image 196
Filip Kilibarda Avatar answered Oct 13 '25 23:10

Filip Kilibarda