I use OHLC re-sampling of 1min time series data in Pandas, the 15min will work perfectly, for example on the following dataframe:
ohlc_dict = {'Open':'first', 'High':'max', 'Low':'min', 'Close': 'last'}
df.resample('15Min').apply(ohlc_dict).dropna(how='any').loc['2011-02-01']
Date Time Open High Low Close
------------------------------------------------------------------
2011-02-01 09:30:00 3081.940 3086.860 3077.832 3081.214
2011-02-01 09:45:00 3082.422 3083.730 3071.922 3073.801
2011-02-01 10:00:00 3073.303 3078.345 3069.130 3078.345
2011-02-01 10:15:00 3078.563 3078.563 3071.522 3072.279
2011-02-01 10:30:00 3071.873 3071.873 3063.497 3067.364
2011-02-01 10:45:00 3066.735 3070.523 3063.402 3069.974
2011-02-01 11:00:00 3069.561 3069.981 3066.286 3069.981
2011-02-01 11:15:00 3070.602 3074.088 3070.373 3073.919
2011-02-01 13:00:00 3074.778 3074.823 3069.925 3069.925
2011-02-01 13:15:00 3070.096 3070.903 3063.457 3063.457
2011-02-01 13:30:00 3063.929 3067.358 3063.929 3067.358
2011-02-01 13:45:00 3067.570 3072.455 3067.570 3072.247
2011-02-01 14:00:00 3072.927 3081.357 3072.767 3080.175
2011-02-01 14:15:00 3078.843 3079.435 3076.733 3076.782
2011-02-01 14:30:00 3076.721 3081.980 3076.721 3081.912
2011-02-01 14:45:00 3082.822 3083.381 3076.722 3077.283
However, when I resample 1min to 1H, the problem comes out. I use default setting, and find the time start from 9 am, but the markert open at 9:30 am.
df.resample('1H').apply(ohlc_dict).dropna(how='any').loc['2011-02-01']
1HourOHLC Wrong in Morning
I then try to change the base
setting, but fail in the afternoon session. The market should open at 13 pm and end at 15 pm, so there should be 13 pm, 14 pm, 15 pm, total 3 bars.
df.resample('60MIN',base=30).apply(ohlc_dict).dropna(how='any').loc['2011-02-01']
1HourOHLC Wrong in afternoon
In conclusion, the problem is I want it fitting in market and has 6 bars (9:30,10:30,11:30,1:00,2:00,3:00)
, but resample
in pandas
only give me 5 bars (9:30,10:30,11:30,1:30,2:30)
I am searching for a long time on net. But no use. Please help or try to give some ideas how to achieve this. Thanks.
Update
Since Pandas version 1.3.0**: There is a simpler way, using the origin
argument of pandas.DataFrame.resample
.
From Pandas docs, setting origin
to start
will use the first value of the timeseries in your case 9:30 and for the below example 9:15.
df = df.resample('1h', origin='start').agg(ohlc_dict)
This works for all timeframes.
Original post
I had the same issue and could'nt find help online. So i wrote this script to convert 1 min OHLC data into 1 hour.
This assumes market timings 9:15am to 3:30pm. If market timings are different simply edit the start_time and end_time to suit your needs.
I havent put any additional checks in case trading was suspended during market hours.
Hope the code is helpful to someone. :)
Sample csv format
Date,O,H,L,C,V
2020-03-12 09:15:00,3860,3867.8,3763.35,3830,58630
2020-03-12 09:16:00,3840.05,3859.4,3809.65,3834.6,67155
2020-03-12 09:17:00,3832.55,3855.4,3823.75,3852,51891
2020-03-12 09:18:00,3851.65,3860.95,3846.35,3859,42205
2020-03-12 09:19:00,3859.45,3860,3848.1,3851.55,33194
Code
from pandas import read_csv, to_datetime, DataFrame
from datetime import time
file_path = 'BAJFINANCE-EQ.csv'
def add(data, b):
# utility function
# appends the value in dictionary 'b'
# to corresponding key in dictionary 'data'
for (key, value) in b.items():
data[key].append(value)
df = read_csv(file_path,
parse_dates=True,
infer_datetime_format=True,
na_filter=False)
df['Date'] = to_datetime(df['Date'], format='%Y-%m-%d %H:%M:%S')
# stores hourly data to convert to dataframe
data = {
'Date': [],
'O': [],
'H': [],
'L': [],
'C': [],
'V': []
}
start_time = [time(9, 15), time(10, 15), time(11, 15), time(
12, 15), time(13, 15), time(14, 15), time(15, 15)]
end_time = [time(10, 14), time(11, 14), time(12, 14), time(
13, 14), time(14, 14), time(15, 14), time(15, 29)]
# Market timings 9:15am to 3:30pm (6 hours 15 mins)
# We create 6 hourly bars and one 15 min bar
# as usually depicted in candlestick charts
i = 0
no_bars = df.shape[0]
while i < no_bars:
if df.loc[i]['Date'].time() in end_time:
end_idx = i + 1
hour_df = df[start_idx:end_idx]
add(data, {
'Date': df.loc[start_idx]['Date'],
'O': hour_df['O'].iloc[0],
'H': hour_df['H'].max(),
'L': hour_df['L'].min(),
'C': hour_df['C'].iloc[-1],
'V': hour_df['V'].sum()
})
if df.loc[i]['Date'].time() in start_time:
start_idx = i
# optional optimisation for large datasets
# skip ahead to loop faster
i += 55
i += 1
df = DataFrame(data=data).set_index(keys=['Date'])
# df.to_csv('out.csv')
print(df)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With