Grouping pandas DataFrame by 10 minute intervals [duplicate]

Question

Given the following pandas dataframe:

            timestamp
0     2018-10-05 23:07:02
1     2018-10-05 23:07:13
2     2018-10-05 23:07:23
3     2018-10-05 23:07:36
4     2018-10-05 23:08:02
5     2018-10-05 23:09:16
6     2018-10-05 23:09:21
7     2018-10-05 23:09:39
8     2018-10-05 23:09:47
9     2018-10-05 23:10:01
10    2018-10-05 23:10:11
11    2018-10-05 23:10:23
12    2018-10-05 23:10:59
13    2018-10-05 23:11:03
14    2018-10-08 03:35:32
15    2018-10-08 03:35:58
16    2018-10-08 03:37:16
17    2018-10-08 03:38:04
18    2018-10-08 03:38:30
19    2018-10-08 03:38:36
20    2018-10-08 03:38:42
21    2018-10-08 03:38:52
22    2018-10-08 03:38:57
23    2018-10-08 03:39:10
24    2018-10-08 03:39:27
25    2018-10-08 03:40:47
26    2018-10-08 03:40:54
27    2018-10-08 03:41:02
28    2018-10-08 03:41:12
29    2018-10-08 03:41:32

How can I label in periods of ten minutes each rows? For example:

            timestamp       10min_period
0     2018-10-05 23:07:02   period_1
2     2018-10-05 23:07:23   period_1
1     2018-10-05 23:07:13   period_1
2     2018-10-05 23:07:23   period_1
3     2018-10-05 23:07:36   period_1
4     2018-10-05 23:08:02   period_1
5     2018-10-05 23:09:16   period_1
6     2018-10-05 23:09:21   period_1
7     2018-10-05 23:09:39   period_1
8     2018-10-05 23:09:47   period_1
9     2018-10-05 23:10:01   period_1
10    2018-10-05 23:10:11   period_1
11    2018-10-05 23:10:23   period_1
12    2018-10-05 23:10:59   period_1
13    2018-10-05 23:11:03   period_1
14    2018-10-08 03:35:32   period_2
15    2018-10-08 03:35:58   period_2
16    2018-10-08 03:37:16   period_2
17    2018-10-08 03:38:04   period_2
18    2018-10-08 03:38:30   period_2
19    2018-10-08 03:38:36   period_2
20    2018-10-08 03:38:42   period_2
21    2018-10-08 03:38:52   period_2
22    2018-10-08 03:38:57   period_2
23    2018-10-08 03:39:10   period_2
24    2018-10-08 03:39:27   period_2
25    2018-10-08 03:40:47   period_2
26    2018-10-08 04:40:54   period_3
27    2018-10-08 04:41:02   period_3
28    2018-10-08 04:41:12   period_3
29    2018-10-08 04:41:32   period_3

As you can see in the above expected output, each period_n label is created by counting 10 minutes periods of time, when the datetime series exceeds a threshold of ten minutes a new label is created. I tried to use the dt.floor(10Min) object, however, its not working because it doesnt keep track of where to start and where to end counting the periods of 10 minutes. Also I tried to:

a = df['timestamp'].offsets.DateOffset(minutes=10)

However, its not working. Any idea of how to segment my DF in periods of 10 minutes? This question is different from others because I am not specifying any particular time to start to count. That is, I start to count from the first datetime row instance, and from that I start to count the periods of ten time minutes.

UPDATE:

After transforming to datetime object, I also tried to

df['timestamp'].groupby(pd.TimeGrouper(freq='10Min'))

However, I got:

TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'RangeIndex'

cs95 · Accepted Answer

With a little vectorised arithmetic, this should be possible (and be performant):

# Convert to datetime if not already.
# df['timestamp'] = pd.to_datetime(df['timestamp'], errors='coerce')    
u = (df.assign(timestamp=df['timestamp'].dt.floor('20min'))
       .groupby(pd.Grouper(key='timestamp',freq='10min'))
       .ngroup())

df['10min_period'] = np.char.add('period_', (pd.factorize(u)[0] + 1).astype(str))

Unfortunately, the drawback here is that, while this will produce the expected output for your sample data, there is no easy way to handle contiguous intervals of 10 minutes (pd.Grouper does not begin grouping from the first date in your column, so the dt.floor('20min') is necessary as an initial step—this will inadvertently lump in some—or most—rows from "period_{i+1}" under "period_{i}").

Emmanuel Arias · Answer

For reproduce your problem I do this:

index = pd.date_range(datetime.datetime.now().date() - datetime.timedelta(10), periods=100, freq='min')

This way I have this DataFrame:

a = pd.DataFrame(index)
a
                     0
0  2018-10-28 00:00:00
1  2018-10-28 00:01:00
2  2018-10-28 00:02:00
3  2018-10-28 00:03:00
4  2018-10-28 00:04:00
5  2018-10-28 00:05:00
6  2018-10-28 00:06:00
7  2018-10-28 00:07:00
8  2018-10-28 00:08:00
9  2018-10-28 00:09:00
10 2018-10-28 00:10:00
                   ...
[100 rows x 1 columns]

Then, I do this:

a['period'] = a.apply(lambda x: "perdio_%d"%(int(x[0].minute/10) + 1), axis=1)

And I have this solution:

                     0    period
0  2018-10-28 00:00:00  perdio_1
1  2018-10-28 00:01:00  perdio_1
2  2018-10-28 00:02:00  perdio_1
3  2018-10-28 00:03:00  perdio_1
4  2018-10-28 00:04:00  perdio_1
5  2018-10-28 00:05:00  perdio_1
6  2018-10-28 00:06:00  perdio_1
7  2018-10-28 00:07:00  perdio_1
8  2018-10-28 00:08:00  perdio_1
9  2018-10-28 00:09:00  perdio_1
10 2018-10-28 00:10:00  perdio_2
11 2018-10-28 00:11:00  perdio_2
12 2018-10-28 00:12:00  perdio_2
13 2018-10-28 00:13:00  perdio_2
14 2018-10-28 00:14:00  perdio_2
15 2018-10-28 00:15:00  perdio_2
                             ...

I hope it has been helpfull

Grouping pandas DataFrame by 10 minute intervals [duplicate]

Tags:

python

datetime

pandas

group-by

pandas-groupby

anon

2 Answers

cs95

Emmanuel Arias

Recent Activity

Donate For Us

Grouping pandas DataFrame by 10 minute intervals [duplicate]

Tags:

python

datetime

pandas

group-by

pandas-groupby

anon

2 Answers

cs95

Emmanuel Arias

Related questions

Recent Activity

Donate For Us