I use OHLC re-sampling of 1min time series data in Pandas, the 15min will work perfectly, for example on the following dataframe:
ohlc_dict = {'Open':'first', 'High':'max', 'Low':'min', 'Close': 'last'}
df.resample('15Min').apply(ohlc_dict).dropna(how='any').loc['2011-02-01']
Date Time             Open         High        Low        Close
------------------------------------------------------------------          
2011-02-01 09:30:00 3081.940    3086.860    3077.832    3081.214
2011-02-01 09:45:00 3082.422    3083.730    3071.922    3073.801
2011-02-01 10:00:00 3073.303    3078.345    3069.130    3078.345
2011-02-01 10:15:00 3078.563    3078.563    3071.522    3072.279
2011-02-01 10:30:00 3071.873    3071.873    3063.497    3067.364
2011-02-01 10:45:00 3066.735    3070.523    3063.402    3069.974
2011-02-01 11:00:00 3069.561    3069.981    3066.286    3069.981
2011-02-01 11:15:00 3070.602    3074.088    3070.373    3073.919
2011-02-01 13:00:00 3074.778    3074.823    3069.925    3069.925
2011-02-01 13:15:00 3070.096    3070.903    3063.457    3063.457
2011-02-01 13:30:00 3063.929    3067.358    3063.929    3067.358
2011-02-01 13:45:00 3067.570    3072.455    3067.570    3072.247
2011-02-01 14:00:00 3072.927    3081.357    3072.767    3080.175
2011-02-01 14:15:00 3078.843    3079.435    3076.733    3076.782
2011-02-01 14:30:00 3076.721    3081.980    3076.721    3081.912
2011-02-01 14:45:00 3082.822    3083.381    3076.722    3077.283
However, when I resample 1min to 1H, the problem comes out. I use default setting, and find the time start from 9 am, but the markert open at 9:30 am.
df.resample('1H').apply(ohlc_dict).dropna(how='any').loc['2011-02-01']
1HourOHLC Wrong in Morning
I then try to change the base setting, but fail in the afternoon session. The market should open at 13 pm and end at 15 pm, so there should be 13 pm, 14 pm, 15 pm, total 3 bars.
df.resample('60MIN',base=30).apply(ohlc_dict).dropna(how='any').loc['2011-02-01']
1HourOHLC Wrong in afternoon
In conclusion, the problem is I want it fitting in market and has 6 bars (9:30,10:30,11:30,1:00,2:00,3:00), but resample in pandas only give me 5 bars (9:30,10:30,11:30,1:30,2:30)
I am searching for a long time on net. But no use. Please help or try to give some ideas how to achieve this. Thanks.
Update
Since Pandas version 1.3.0**: There is a simpler way, using the origin argument of pandas.DataFrame.resample.
From Pandas docs, setting origin to start will use the first value of the timeseries in your case 9:30 and for the below example 9:15.
df = df.resample('1h', origin='start').agg(ohlc_dict)
This works for all timeframes.
Original post
I had the same issue and could'nt find help online. So i wrote this script to convert 1 min OHLC data into 1 hour.
This assumes market timings 9:15am to 3:30pm. If market timings are different simply edit the start_time and end_time to suit your needs.
I havent put any additional checks in case trading was suspended during market hours.
Hope the code is helpful to someone. :)
Sample csv format
Date,O,H,L,C,V
2020-03-12 09:15:00,3860,3867.8,3763.35,3830,58630
2020-03-12 09:16:00,3840.05,3859.4,3809.65,3834.6,67155
2020-03-12 09:17:00,3832.55,3855.4,3823.75,3852,51891
2020-03-12 09:18:00,3851.65,3860.95,3846.35,3859,42205
2020-03-12 09:19:00,3859.45,3860,3848.1,3851.55,33194
Code
from pandas import read_csv, to_datetime, DataFrame
from datetime import time
file_path = 'BAJFINANCE-EQ.csv'
def add(data, b):
    # utility function
    # appends the value in dictionary 'b'
    # to corresponding key in dictionary 'data'
    for (key, value) in b.items():
        data[key].append(value)
df = read_csv(file_path,
              parse_dates=True,
              infer_datetime_format=True,
              na_filter=False)
df['Date'] = to_datetime(df['Date'], format='%Y-%m-%d %H:%M:%S')
# stores hourly data to convert to dataframe
data = {
    'Date': [],
    'O': [],
    'H': [],
    'L': [],
    'C': [],
    'V': []
}
start_time = [time(9, 15), time(10, 15), time(11, 15), time(
    12, 15), time(13, 15), time(14, 15), time(15, 15)]
end_time = [time(10, 14), time(11, 14), time(12, 14), time(
    13, 14), time(14, 14), time(15, 14), time(15, 29)]
# Market timings 9:15am to 3:30pm (6 hours 15 mins)
# We create 6 hourly bars and one 15 min bar
# as usually depicted in candlestick charts
i = 0
no_bars = df.shape[0]
while i < no_bars:
    if df.loc[i]['Date'].time() in end_time:
        end_idx = i + 1
        hour_df = df[start_idx:end_idx]
        add(data, {
            'Date': df.loc[start_idx]['Date'],
            'O':    hour_df['O'].iloc[0],
            'H':    hour_df['H'].max(),
            'L':    hour_df['L'].min(),
            'C':    hour_df['C'].iloc[-1],
            'V':    hour_df['V'].sum()
        })
    if df.loc[i]['Date'].time() in start_time:
        start_idx = i
        # optional optimisation for large datasets
        # skip ahead to loop faster
        i += 55
    i += 1
df = DataFrame(data=data).set_index(keys=['Date'])
# df.to_csv('out.csv')
print(df)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With