I have a df with the usual timestamps as an index:
    2011-04-01 09:30:00
    2011-04-01 09:30:10
    ...
    2011-04-01 09:36:20
    ...
    2011-04-01 09:37:30
How can I create a column to this dataframe with the same timestamp but rounded to the nearest 5th minute interval? Like this:
    index                 new_col
    2011-04-01 09:30:00   2011-04-01 09:35:00        
    2011-04-01 09:30:10   2011-04-01 09:35:00
    2011-04-01 09:36:20   2011-04-01 09:40:00
    2011-04-01 09:37:30   2011-04-01 09:40:00
The round_to_5min(t) solution using timedelta arithmetic is correct but complicated and very slow. Instead make use of the nice Timstamp in pandas:
import numpy as np
import pandas as pd
ns5min=5*60*1000000000   # 5 minutes in nanoseconds 
pd.to_datetime(((df.index.astype(np.int64) // ns5min + 1 ) * ns5min))
Let's compare the speed:
rng = pd.date_range('1/1/2014', '1/2/2014', freq='S')
print len(rng)
# 86401
# ipython %timeit 
%timeit pd.to_datetime(((rng.astype(np.int64) // ns5min + 1 ) * ns5min))
# 1000 loops, best of 3: 1.01 ms per loop
%timeit rng.map(round_to_5min)
# 1 loops, best of 3: 1.03 s per loop
Just about 1000 times faster!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With