How to add an extra column that is the cumulative value of the time differences for each course? For example, the initial table is:
 id_A       course     weight                ts_A       value
 id1        cotton     3.5       2017-04-27 01:35:30  150.000000
 id1        cotton     3.5       2017-04-27 01:36:00  416.666667
 id1        cotton     3.5       2017-04-27 01:36:30  700.000000
 id1        cotton     3.5       2017-04-27 01:37:00  950.000000
 id2     cotton blue   5.0       2017-04-27 02:35:30  150.000000
 id2     cotton blue   5.0       2017-04-27 02:36:00  450.000000
 id2     cotton blue   5.0       2017-04-27 02:36:30  520.666667
 id2     cotton blue   5.0       2017-04-27 02:37:00  610.000000
The expected result is:
 id_A       course     weight                ts_A       value      cum_delta_sec
 id1        cotton     3.5       2017-04-27 01:35:30  150.000000      0
 id1        cotton     3.5       2017-04-27 01:36:00  416.666667      30 
 id1        cotton     3.5       2017-04-27 01:36:30  700.000000      60
 id1        cotton     3.5       2017-04-27 01:37:00  950.000000      90
 id2     cotton blue   5.0       2017-04-27 02:35:30  150.000000      0
 id2     cotton blue   5.0       2017-04-27 02:36:00  450.000000      30
 id2     cotton blue   5.0       2017-04-27 02:36:30  520.666667      60
 id2     cotton blue   5.0       2017-04-27 02:37:00  610.000000      90
Cumulative or Span of Time is the most common way time is used in a measure. Traditionally Cumulative measures sum data across a span of time.
The cumsum() method returns a DataFrame with the cumulative sum for each row. The cumsum() method goes through the values in the DataFrame, from the top, row by row, adding the values with the value from the previous row, ending up with a DataFrame where the last row contains the sum of all values for each column.
You can chain the diff method with cumsum:
# convert ts_A to datetime type
df.ts_A = pd.to_datetime(df.ts_A)
# convert ts_A to seconds, group by id and then use transform to calculate the cumulative difference
df['cum_delta_sec'] = df.ts_A.astype(int).div(10**9).groupby(df.id_A).transform(lambda x: x.diff().fillna(0).cumsum())
df

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With