Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas resample on OHLC data from 1min to 1H

I use OHLC re-sampling of 1min time series data in Pandas, the 15min will work perfectly, for example on the following dataframe:

ohlc_dict = {'Open':'first', 'High':'max', 'Low':'min', 'Close': 'last'}
df.resample('15Min').apply(ohlc_dict).dropna(how='any').loc['2011-02-01']

Date Time             Open         High        Low        Close
------------------------------------------------------------------          
2011-02-01 09:30:00 3081.940    3086.860    3077.832    3081.214

2011-02-01 09:45:00 3082.422    3083.730    3071.922    3073.801

2011-02-01 10:00:00 3073.303    3078.345    3069.130    3078.345

2011-02-01 10:15:00 3078.563    3078.563    3071.522    3072.279

2011-02-01 10:30:00 3071.873    3071.873    3063.497    3067.364

2011-02-01 10:45:00 3066.735    3070.523    3063.402    3069.974

2011-02-01 11:00:00 3069.561    3069.981    3066.286    3069.981

2011-02-01 11:15:00 3070.602    3074.088    3070.373    3073.919

2011-02-01 13:00:00 3074.778    3074.823    3069.925    3069.925

2011-02-01 13:15:00 3070.096    3070.903    3063.457    3063.457

2011-02-01 13:30:00 3063.929    3067.358    3063.929    3067.358

2011-02-01 13:45:00 3067.570    3072.455    3067.570    3072.247

2011-02-01 14:00:00 3072.927    3081.357    3072.767    3080.175

2011-02-01 14:15:00 3078.843    3079.435    3076.733    3076.782

2011-02-01 14:30:00 3076.721    3081.980    3076.721    3081.912

2011-02-01 14:45:00 3082.822    3083.381    3076.722    3077.283

However, when I resample 1min to 1H, the problem comes out. I use default setting, and find the time start from 9 am, but the markert open at 9:30 am.

df.resample('1H').apply(ohlc_dict).dropna(how='any').loc['2011-02-01']

1HourOHLC Wrong in Morning

I then try to change the base setting, but fail in the afternoon session. The market should open at 13 pm and end at 15 pm, so there should be 13 pm, 14 pm, 15 pm, total 3 bars.

df.resample('60MIN',base=30).apply(ohlc_dict).dropna(how='any').loc['2011-02-01']

1HourOHLC Wrong in afternoon

In conclusion, the problem is I want it fitting in market and has 6 bars (9:30,10:30,11:30,1:00,2:00,3:00), but resample in pandas only give me 5 bars (9:30,10:30,11:30,1:30,2:30)

I am searching for a long time on net. But no use. Please help or try to give some ideas how to achieve this. Thanks.

like image 964
zhwech Avatar asked Sep 08 '25 07:09

zhwech


1 Answers

Update

Since Pandas version 1.3.0**: There is a simpler way, using the origin argument of pandas.DataFrame.resample.

From Pandas docs, setting origin to start will use the first value of the timeseries in your case 9:30 and for the below example 9:15.

df = df.resample('1h', origin='start').agg(ohlc_dict)

This works for all timeframes.


Original post

I had the same issue and could'nt find help online. So i wrote this script to convert 1 min OHLC data into 1 hour.

This assumes market timings 9:15am to 3:30pm. If market timings are different simply edit the start_time and end_time to suit your needs.

I havent put any additional checks in case trading was suspended during market hours.

Hope the code is helpful to someone. :)

Sample csv format

Date,O,H,L,C,V
2020-03-12 09:15:00,3860,3867.8,3763.35,3830,58630
2020-03-12 09:16:00,3840.05,3859.4,3809.65,3834.6,67155
2020-03-12 09:17:00,3832.55,3855.4,3823.75,3852,51891
2020-03-12 09:18:00,3851.65,3860.95,3846.35,3859,42205
2020-03-12 09:19:00,3859.45,3860,3848.1,3851.55,33194

Code

from pandas import read_csv, to_datetime, DataFrame
from datetime import time

file_path = 'BAJFINANCE-EQ.csv'


def add(data, b):
    # utility function
    # appends the value in dictionary 'b'
    # to corresponding key in dictionary 'data'
    for (key, value) in b.items():
        data[key].append(value)


df = read_csv(file_path,
              parse_dates=True,
              infer_datetime_format=True,
              na_filter=False)

df['Date'] = to_datetime(df['Date'], format='%Y-%m-%d %H:%M:%S')

# stores hourly data to convert to dataframe
data = {
    'Date': [],
    'O': [],
    'H': [],
    'L': [],
    'C': [],
    'V': []
}

start_time = [time(9, 15), time(10, 15), time(11, 15), time(
    12, 15), time(13, 15), time(14, 15), time(15, 15)]

end_time = [time(10, 14), time(11, 14), time(12, 14), time(
    13, 14), time(14, 14), time(15, 14), time(15, 29)]


# Market timings 9:15am to 3:30pm (6 hours 15 mins)
# We create 6 hourly bars and one 15 min bar
# as usually depicted in candlestick charts
i = 0
no_bars = df.shape[0]
while i < no_bars:

    if df.loc[i]['Date'].time() in end_time:
        end_idx = i + 1

        hour_df = df[start_idx:end_idx]

        add(data, {
            'Date': df.loc[start_idx]['Date'],
            'O':    hour_df['O'].iloc[0],
            'H':    hour_df['H'].max(),
            'L':    hour_df['L'].min(),
            'C':    hour_df['C'].iloc[-1],
            'V':    hour_df['V'].sum()
        })

    if df.loc[i]['Date'].time() in start_time:
        start_idx = i

        # optional optimisation for large datasets
        # skip ahead to loop faster
        i += 55

    i += 1


df = DataFrame(data=data).set_index(keys=['Date'])
# df.to_csv('out.csv')
print(df)
like image 121
Benny Thadikaran Avatar answered Sep 09 '25 20:09

Benny Thadikaran