Pandas : How to aggregate hourly count with time start and end

Question

I have a dataframe with start and end time for each unique rating ID.

d={'ID':['01','02','03','04','05','06'],'Hour Start':[5,9,13,15,20,23],'Hour End':[6,9,15,19,0,2]}
df=pd.DataFrame(data=d)

My goal is to aggregate how many ratings were active hourly for the whole dataset. For example, ID:01 started during 5 am and 6 am. Then 5 am and 6 am should both add 1 count each.

But for ID:06, the rating started in 11 pm and ended next day at 2 am. Hence each hour should add 1 count each hour from 11 pm to 2 am.

I want to output a table for hourly summary like below.

enter image description here

I have been thinking a while for a solution.

Any help would be very appreciated ! Thank you !

DavideBrex · Accepted Answer

You can convert to datetime both the hour start and end columns. Then you compute the difference in time. Finally, you convert the time difference to difference in hours (divide the seconds by 3600):

df['Hours_s'] = pd.to_datetime(df['Hour Start'], format='%H' )
df['Hours_e'] = pd.to_datetime(df['Hour End'], format='%H' )
df['delta'] = df['Hours_e']-df['Hours_s']
df["count"] = df["delta"].apply(lambda x: x.seconds//3600)

Output:

ID   Hour_Start Hour_End count
0          5       6       1
1          9       9       0
2          13      15      2
3          15      19      4
4          20      0       4
5          23      2       3

UPDATE:

final_tab = pd.DataFrame({"Hour": range(0,24), "Count": [0]*24})

for i, row in df.iterrows():
    if row["delta"].days != 0:
        final_tab.iloc[row["Hour Start"]:24,1] =final_tab.iloc[row["Hour Start"]:24,1] +1
        final_tab.iloc[0:row["Hour End"]+1,1] =final_tab.iloc[0:row["Hour End"]+1,1] +1
    else:
        final_tab.iloc[row["Hour Start"]:row["Hour Start"]+row["count"],1] = final_tab.iloc[row["Hour Start"]:row["Hour Start"]+row["count"],1] + 1

Output:

print(final_tab)
   Hour Count
0   0   2
1   1   1
2   2   1
3   3   0
4   4   0
5   5   1
6   6   1
7   7   0
8   8   0
9   9   1
10  10  0
11  11  0
12  12  0
13  13  1
14  14  1
15  15  2
16  16  1
17  17  1
18  18  1
19  19  1
20  20  1
21  21  1
22  22  1
23  23  2

Scott Boston · Answer

IIUC, you can do it like this using pd.to_datetime and pd.date_range:

#Convert hours to datetime
df['endTime'] = pd.to_datetime(df['Hour End'], format='%H')
df['startTime'] = pd.to_datetime(df['Hour Start'], format='%H')

#If 'Hour End' less thn 'Hour Start' assume next day
df['endTime'] = np.where(df['Hour End'] < df['Hour Start'], 
                         df['endTime']+pd.Timedelta(days=1), 
                         df['endTime'])

#Create a series of hours per defined ranges ('Hour Start' to 'Hour End')
df_hourly = df.apply(lambda x: pd.Series(pd.date_range(x['startTime'], 
                                                       x['endTime'], 
                                                       freq='H')), 
                                         axis=1)\
              .stack().dt.hour

#Use value counts to count the hours and reindex to 24-hour day to fill missing hours.
df_hourly.value_counts().reindex(np.arange(0,24)).fillna(0).astype(int)

Output:

Alternatively, using explode and value_counts:

df.apply(lambda x: pd.date_range(x['startTime'], 
                                 x['endTime'], 
                                 freq='H'), axis=1)\
  .explode().dt.hour.value_counts()\
  .reindex(np.arange(0,24), fill_value=0)

Pandas : How to aggregate hourly count with time start and end

Tags:

python

pandas

C4TNT

2 Answers

DavideBrex

Scott Boston

Recent Activity

Donate For Us

Pandas : How to aggregate hourly count with time start and end

Tags:

python

pandas

C4TNT

2 Answers

DavideBrex

Scott Boston

Related questions

Recent Activity

Donate For Us