I am trying to distribute the total sum of a time period evenly to the components of the higher sampled time period.
What I did:
>>> rng = pandas.PeriodIndex(start='2014-01-01', periods=2, freq='W')
>>> ts = pandas.Series([i+1 for i in range(len(rng))], index=rng)
>>> ts
2013-12-30/2014-01-05    1
2014-01-06/2014-01-12    2
Freq: W-SUN, dtype: float64
>>> ts.resample('D')
2013-12-30     1
2013-12-31   NaN
2014-01-01   NaN
2014-01-02   NaN
2014-01-03   NaN
2014-01-04   NaN
2014-01-05   NaN
2014-01-06     2
2014-01-07   NaN
2014-01-08   NaN
2014-01-09   NaN
2014-01-10   NaN
2014-01-11   NaN
2014-01-12   NaN
Freq: D, dtype: float64
What I actually want is:
>>> ts.resample('D', some_miracle_thing)
2013-12-30     1/7
2013-12-31     1/7
2014-01-01     1/7
2014-01-02     1/7
2014-01-03     1/7
2014-01-04     1/7
2014-01-05     1/7
2014-01-06     2/7
2014-01-07     2/7
2014-01-08     2/7
2014-01-09     2/7
2014-01-10     2/7
2014-01-11     2/7
2014-01-12     2/7
Freq: D, dtype: float64
Is there a way to do it
x/7 lambda function?First ensure that your dataframe has an index of type DateTimeIndex . Then use the resample function to either upsample (higher frequency) or downsample (lower frequency) your dataframe. Then apply an aggregator (e.g. sum ) to aggregate the values across the new sampling frequency.
To resample time series data means to summarize or aggregate the data by a new time period.
Resample time-series data. Convenience method for frequency conversion and resampling of time series. The object must have a datetime-like index ( DatetimeIndex , PeriodIndex , or TimedeltaIndex ), or the caller must pass the label of a datetime-like series/index to the on / level keyword parameter.
I hate this solution, but it works for upsampling when you're unsure of the number of new intervals. Going from week to day is easy, it's always 7 days / week. But I've found the number of intervals based on an upsample is usually unknown - this solution works for that.
The idea is to get the number of post-resample intervals into the initial pre-resampled dataframe, then re-resample and divide your data by the interval count. Side note - this is for a dataframe, not a series.
# Create unique group IDs by simply using the existing index (Assumes an integer, non-duplicated index)
df['group'] = df.index  
# Get the count of intervals for each post-resampled timestamp.
df['count'] = df.set_index('timestamp').resample('15min').ffill()['group'].value_counts()
# Resample all data again and fill so that the count is now included in every row.
df          = df.set_index('timestamp').resample('15min').ffill()
# Apply the division on the entire dataframe and clean up.
df          = df.div(df['count'], axis = 0).reset_index().drop(['group','count'], axis = 1)
I'd wrap this in a function and tuck it away so I never have to look at it again, with something like:
def distribute_upsample(df, index, freq)
Where index would be 'timestamp' and freq would be '15min'
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With