Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Count number of events per hour given in datetime/numpy

Assume that I have an array of timestamp's (given as datetime or numpy.datetime64 for instance). Furthermore, each entry represents an occurrence. I want to bucket the events on a per-hour basis; i.e. count how many events occurred per hour. Is there a straightforward way to do it? Some Numpy/Pandas magic? Or is the only way is to convert the timestamp's to seconds and bucket the seconds, as suggested here

like image 873
Dror Avatar asked Sep 14 '25 03:09

Dror


1 Answers

If you are using Pandas, and you have a time series that has a DateTime index, you can use the resample method, with how='count'. For example, in the following, rng is a range of timestamps, and ts is a series of values that uses rng as the index. (For your calculation, the values in the time series don't matter.) In this example, there are 360 "events" per hour, so the expected result is 360 for each hour except the last.

Create some sample data:

In [71]: import pandas as pd

In [72]: rng = pd.date_range('1/1/2011', periods=10000, freq='10S')

In [73]: ts = pd.Series(np.random.randint(0, 5, len(rng)), index=rng)

In [74]: ts
Out[74]: 
2011-01-01 00:00:00    2
2011-01-01 00:00:10    0
2011-01-01 00:00:20    1
2011-01-01 00:00:30    4
2011-01-01 00:00:40    1
2011-01-01 00:00:50    1
2011-01-01 00:01:00    2
2011-01-01 00:01:10    0
2011-01-01 00:01:20    3
2011-01-01 00:01:30    4
2011-01-01 00:01:40    2
2011-01-01 00:01:50    4
2011-01-01 00:02:00    1
2011-01-01 00:02:10    0
2011-01-01 00:02:20    4
...
2011-01-02 03:44:10    2
2011-01-02 03:44:20    0
2011-01-02 03:44:30    3
2011-01-02 03:44:40    0
2011-01-02 03:44:50    0
2011-01-02 03:45:00    4
2011-01-02 03:45:10    3
2011-01-02 03:45:20    2
2011-01-02 03:45:30    0
2011-01-02 03:45:40    1
2011-01-02 03:45:50    0
2011-01-02 03:46:00    2
2011-01-02 03:46:10    0
2011-01-02 03:46:20    2
2011-01-02 03:46:30    2
Freq: 10S, Length: 10000

Use the resample method to count the number of events in each hour. The first argument, 'H', means we are resampling to an hourly rate.

In [75]: ts.resample('H', how='count')
Out[75]: 
2011-01-01 00:00:00    360
2011-01-01 01:00:00    360
2011-01-01 02:00:00    360
2011-01-01 03:00:00    360
2011-01-01 04:00:00    360
2011-01-01 05:00:00    360
2011-01-01 06:00:00    360
2011-01-01 07:00:00    360
2011-01-01 08:00:00    360
2011-01-01 09:00:00    360
2011-01-01 10:00:00    360
2011-01-01 11:00:00    360
2011-01-01 12:00:00    360
2011-01-01 13:00:00    360
2011-01-01 14:00:00    360
2011-01-01 15:00:00    360
2011-01-01 16:00:00    360
2011-01-01 17:00:00    360
2011-01-01 18:00:00    360
2011-01-01 19:00:00    360
2011-01-01 20:00:00    360
2011-01-01 21:00:00    360
2011-01-01 22:00:00    360
2011-01-01 23:00:00    360
2011-01-02 00:00:00    360
2011-01-02 01:00:00    360
2011-01-02 02:00:00    360
2011-01-02 03:00:00    280
Freq: H, dtype: int64
like image 63
Warren Weckesser Avatar answered Sep 16 '25 17:09

Warren Weckesser