Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Business hours between two Series of Timestamps excluding weekends AND holidays

I have a pandas DataFrame that looks like this (sample):

data = { 
    'start': ['2018-10-29 18:48:46.697000',
              '2018-10-29 19:01:10.887000',
              '2018-10-22 17:42:24.467000'], 
    'end': ['2018-10-31 17:56:38.830000',
            '2018-11-27 09:31:39.967000',
            '2018-11-28 18:33:35.243000' ]   
}
df = pd.DataFrame(data)
df['start'] = pd.to_datetime(df['start'])
df['end'] = pd.to_datetime(df['end'])

My goal is to calculate the (US) business hours between start and end, excluding weekends and holidays. For that I am using the CustomBusinessDay functionality of pandas as follows:

from pandas.tseries.holiday import USFederalHolidayCalendar
from pandas.tseries.offsets import CustomBusinessDay
us_bd = CustomBusinessDay(calendar=USFederalHolidayCalendar())

len(pd.bdate_range(start=df['start'][2], end=df['end'][2], freq=us_bd))
>> 26

This is correct in terms of business days (it does exclude weekends, and the holidays of Thanksgiving and Black Friday), but what I really want is the number of business hours between the two timestamps. So when I try the native BH:

len(pd.bdate_range(start=df['start'][2], end=df['end'][2], freq='BH'))
>> 216

which is incorrect, because it accounts for weekends, but not for the holidays. So, I have two questions:

  1. How to calculate business hours between two timestamps correctly, taking both weekends and holidays into account by excluding them
  2. How to propagate this calculation across the pandas Series in order to generate a new column in the DataFrame?

When I try something like:

df['diff'] = pd.bdate_range(start=df['start'], end=df['end'], freq='BH')

The result is:

TypeError: Can not convert input [...] of type to Timestamp

The error message also included the whole series in the array.

like image 833
nvergos Avatar asked Nov 16 '25 13:11

nvergos


1 Answers

You should use CustomBusinessHour and pd.date_range instead of pd.bdate_range.

The number of hours for your second row should be 145 because endtime is 09:31:39.967.

us_bh = CustomBusinessHour(calendar=USFederalHolidayCalendar())
df['count'] = df.apply(lambda x: len(pd.date_range(start=x.start, end=x.end, freq= us_bh)),axis=1)
df['diff'] = df.apply(lambda x: pd.date_range(start=x.start, end=x.end, freq= us_bh),axis=1)
print(df)


    start                     end                  count                                               diff
0 2018-10-29 18:48:46.697 2018-10-31 17:56:38.830     16  DatetimeIndex(['2018-10-30 09:00:00', '2018-10...
1 2018-10-29 19:01:10.887 2018-11-27 09:31:39.967    145  DatetimeIndex(['2018-10-30 09:00:00', '2018-10...
2 2018-10-22 17:42:24.467 2018-11-28 18:33:35.243    200  DatetimeIndex(['2018-10-23 09:00:00', '2018-10...

And diff columns start business hour will '2018-10-29 09:00:00' when you use pd.bdate_range.

us_bh = CustomBusinessHour(calendar=USFederalHolidayCalendar())
df['count'] = df.apply(lambda x: len(pd.bdate_range(start=x.start, end=x.end, freq= us_bh)),axis=1)
df['diff'] = df.apply(lambda x: pd.bdate_range(start=x.start, end=x.end, freq= us_bh),axis=1)
print(df)

                    start                     end  count                                               diff
0 2018-10-29 18:48:46.697 2018-10-31 17:56:38.830     16  DatetimeIndex(['2018-10-29 09:00:00', '2018-10...
1 2018-10-29 19:01:10.887 2018-11-27 09:31:39.967    152  DatetimeIndex(['2018-10-29 09:00:00', '2018-10...
2 2018-10-22 17:42:24.467 2018-11-28 18:33:35.243    200  DatetimeIndex(['2018-10-22 09:00:00', '2018-10...
like image 170
giser_yugang Avatar answered Nov 18 '25 05:11

giser_yugang



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!