How to groupby timeseries and aggregate start/stop times in pandas

Question

I have a df that looks like this:

       data  start  stop
    10    1.0   1.5
    10    2.0   2.5
    10    3.0   3.5
    10    4.0   4.5
    10    5.0   5.5
    10    6.0   6.5
    10    7.0   7.5
    10    8.0   8.5
    14    9.0   9.5
    14   10.0  10.5
    10   11.0  11.5
    10   12.0  12.5
    10   13.0  13.5
    10   14.0  14.5
    14   15.0  15.5
    10   16.0  16.5
    10   17.0  17.5
    11   18.0  18.5
    11   19.0  19.5
    11   20.0  20.5

I want to group the df by df.data and aggregate the df.start and df.stop time for the group in columns. It should look like this:

    data  start  stop
    10    1.0   8.5
    14    9.0   10.5
    10   11.0   14.5
    14   15.0   15.5
    10   16.0   17.5
    11   18.0   20.5

ALollz · Accepted Answer

You use ne + shift + cumsum to group by consecutive values. Then choose the proper aggregation for each column. Given the ordering of your data you could equally use 'first' and 'last' to aggregate start and stop respectively.

d = {'data': 'first', 'start': 'min', 'stop': 'max'}   # How to aggregate
s = df.data.ne(df.data.shift(1)).cumsum().rename(None) # How to group

df.groupby(s).agg(d)
#   data  start  stop
#1    10    1.0   8.5
#2    14    9.0  10.5
#3    10   11.0  14.5
#4    14   15.0  15.5
#5    10   16.0  17.5
#6    11   18.0  20.5

How to groupby timeseries and aggregate start/stop times in pandas

Tags:

python

pandas

connor449

1 Answers

ALollz

Recent Activity

Donate For Us

How to groupby timeseries and aggregate start/stop times in pandas

Tags:

python

pandas

connor449

1 Answers

ALollz

Related questions

Recent Activity

Donate For Us