Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to groupby timeseries and aggregate start/stop times in pandas

Tags:

python

pandas

I have a df that looks like this:

       data  start  stop
    10    1.0   1.5
    10    2.0   2.5
    10    3.0   3.5
    10    4.0   4.5
    10    5.0   5.5
    10    6.0   6.5
    10    7.0   7.5
    10    8.0   8.5
    14    9.0   9.5
    14   10.0  10.5
    10   11.0  11.5
    10   12.0  12.5
    10   13.0  13.5
    10   14.0  14.5
    14   15.0  15.5
    10   16.0  16.5
    10   17.0  17.5
    11   18.0  18.5
    11   19.0  19.5
    11   20.0  20.5

I want to group the df by df.data and aggregate the df.start and df.stop time for the group in columns. It should look like this:

    data  start  stop
    10    1.0   8.5
    14    9.0   10.5
    10   11.0   14.5
    14   15.0   15.5
    10   16.0   17.5
    11   18.0   20.5
like image 294
connor449 Avatar asked Dec 05 '25 03:12

connor449


1 Answers

You use ne + shift + cumsum to group by consecutive values. Then choose the proper aggregation for each column. Given the ordering of your data you could equally use 'first' and 'last' to aggregate start and stop respectively.

d = {'data': 'first', 'start': 'min', 'stop': 'max'}   # How to aggregate
s = df.data.ne(df.data.shift(1)).cumsum().rename(None) # How to group

df.groupby(s).agg(d)
#   data  start  stop
#1    10    1.0   8.5
#2    14    9.0  10.5
#3    10   11.0  14.5
#4    14   15.0  15.5
#5    10   16.0  17.5
#6    11   18.0  20.5
like image 185
ALollz Avatar answered Dec 07 '25 15:12

ALollz



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!