I have a df that looks like this:
data start stop
10 1.0 1.5
10 2.0 2.5
10 3.0 3.5
10 4.0 4.5
10 5.0 5.5
10 6.0 6.5
10 7.0 7.5
10 8.0 8.5
14 9.0 9.5
14 10.0 10.5
10 11.0 11.5
10 12.0 12.5
10 13.0 13.5
10 14.0 14.5
14 15.0 15.5
10 16.0 16.5
10 17.0 17.5
11 18.0 18.5
11 19.0 19.5
11 20.0 20.5
I want to group the df by df.data and aggregate the df.start and df.stop time for the group in columns. It should look like this:
data start stop
10 1.0 8.5
14 9.0 10.5
10 11.0 14.5
14 15.0 15.5
10 16.0 17.5
11 18.0 20.5
You use ne + shift + cumsum to group by consecutive values. Then choose the proper aggregation for each column. Given the ordering of your data you could equally use 'first' and 'last' to aggregate start and stop respectively.
d = {'data': 'first', 'start': 'min', 'stop': 'max'} # How to aggregate
s = df.data.ne(df.data.shift(1)).cumsum().rename(None) # How to group
df.groupby(s).agg(d)
# data start stop
#1 10 1.0 8.5
#2 14 9.0 10.5
#3 10 11.0 14.5
#4 14 15.0 15.5
#5 10 16.0 17.5
#6 11 18.0 20.5
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With