Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Aggregating/resampling a pandas multiindex dataframe over many timeframes and forecasting ARIMA

I have multiple timeframes I want to track and predict values for (ARIMA forecasting 1 period ahead per timeframe), and my algorithm retrieves data on the lowest timeframe. Note: each timeframe is evenly (but not necessarily symmetrically) divisible by lower timeframes. As an important side note, intraday timeframes (1H, 4H) should start at 9:30am (market open time).

1min, 5min, 15min, 30min, 1H, 4H, 1D, 1W, 1M

First, the algorithm does a bulk history call and gets all 1-minute data within a given timespan (warm up data). Then during the course of its runtime (never stops; it's a trading algorithm), it receives data every minute.

The first step (after receiving bulk historical data) is to aggregate it for each respective higher timeframe using a smaller and smaller dataset to save processing time:

i.e.,

1min timeframe data = 1min data (historical bulk data)
5min timeframe data = aggregated 1min timeframe data to 5min
15min timeframe data = aggregated 5min timeframe data to 15min
etc...


Given the following example dataset (and assume there could be other symbols included):

                                 close        high         low        open
symbol time                                                                  
SPY    2018-05-22 09:31:00  270.245900  270.374166  270.226167  270.305100   
       2018-05-22 09:32:00  270.344566  270.374166  270.206434  270.245900   
       2018-05-22 09:33:00  270.374166  270.374166  270.314966  270.344566   
       2018-05-22 09:34:00  270.275500  270.374166  270.245900  270.374166   
       2018-05-22 09:35:00  270.413632  270.443232  270.275500  270.275500   
       2018-05-22 09:36:00  270.502431  270.522165  270.384032  270.413632   
       2018-05-22 09:37:00  270.541898  270.591231  270.492565  270.502431

Q1: How do I aggregate a multi-index pandas dataframe such that the time index and columns are aggregated per symbol? A partial untested (and likely not working) solution illustrating my intent follows:

# Where timeframe.Frequency is a string such as "1H" or "1min"
df.resample(timeframe.Frequency).agg({"open": "first", "close": "last", "low": "min", "high": "max"})

A1: df.groupby(['symbol', pd.Grouper(freq=timeframe.Frequency, level='time')]).agg({'open':'first', 'close':'last', 'high':'max', 'low':'min'})


Q2: Given that there will be partially completed 'bars' for higher timeframes (ex., 5min timeframe data will contain @9:35am full bar data and @9:37am partial bar data represented as if it were @9:40am), is it wise to have partial data in a given timeframe's dataset, or should I use QuantConnect's consolidator to complete the bar before adding it to the dataset during runtime? How do I account for this partial bar data when processing history?

A2: Partial bars are indeed created when resampling. For example, I may have a single 1 minute data point after June 1st, and my monthly timeframe dataframe will have a June 30th entry with the OHLC values of the 1 minute bar. At this point I can either remove the row or just keep updating it with new values as data comes in.

like image 456
SnakeWasTheNameTheyGaveMe Avatar asked Jan 17 '26 23:01

SnakeWasTheNameTheyGaveMe


1 Answers

For your first question, assuming that the data is indexed by symbol and time:

(df.groupby(['symbol', df.index.get_level_values(1).floor('5T')])
   .agg({"open": "first", 
         "close": "last", 
         "low": "min", 
         "high": "max"})
)

gives:

                                open        close       low         high
symbol  time                
SPY     2018-05-22 09:30:00     270.305     270.276     270.206     270.374
SPY     2018-05-22 09:35:00     270.276     270.542     270.276     270.591

The second question is too qualitative and much depend on your data as well as sampling windows.

like image 56
Quang Hoang Avatar answered Jan 19 '26 13:01

Quang Hoang



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!