I have the following dataframe:
Disease HeartRate State MonthStart MonthEnd
Covid 89 Texas 2020-02-28 2020-03-31
Covid 91 Texas 2020-03-31 2020-04-30
Covid 87 Texas 2020-07-31 2020-08-30
Cancer 90 Texas 2020-02-28 2020-03-31
Cancer 88 Florida 2020-03-31 2020-04-30
Covid 89 Florida 2020-02-28 2020-03-31
Covid 87 Florida 2020-03-31 2020-04-30
Flu 90 Florida 2020-02-28 2020-03-31
I have to subtract ‘previous row’ from the ‘current row’ in the ‘Heart’ column and create a new one.
However, there are some conditions:
Desired output:
Disease HeartRate State MonthStart MonthEnd HeartRateDiff
Covid 89 Texas 2020-02-28 2020-03-31 89
Covid 91 Texas 2020-03-31 2020-04-30 2
Covid 87 Texas 2020-07-31 2020-08-30 87
Cancer 90 Texas 2020-02-28 2020-03-31 90
Cancer 88 Florida 2020-03-31 2020-04-30 88
Covid 89 Florida 2020-02-28 2020-03-31 89
Covid 87 Florida 2020-03-31 2020-04-30 -2
Flu 90 Florida 2020-02-28 2020-03-31 90
I know how to subtract previous row from the current row using the following code:
df[‘DiffHeartRate’] = df.groupby(['Disease', 'State'])['HeartRate'].transform(pd.Series.diff)
However, I’m facing two problems:
Is there a smarter way of doing it? Any help would be appreciated. Thanks!
You may try something like this:
df['DiffHeartRate']=(df.groupby(['Disease', 'State',
(df.MonthStart.dt.month.ne(df.MonthStart.dt.month.shift()+1)).cumsum()])['HeartRate']
.apply(lambda x: x.diff())).fillna(df.HeartRate)
Disease HeartRate State MonthStart MonthEnd DiffHeartRate
0 Covid 89 Texas 2020-02-28 2020-03-31 89.0
1 Covid 91 Texas 2020-03-31 2020-04-30 2.0
2 Covid 87 Texas 2020-07-31 2020-08-30 87.0
3 Cancer 90 Texas 2020-02-28 2020-03-31 90.0
4 Cancer 88 Florida 2020-03-31 2020-04-30 88.0
5 Covid 89 Florida 2020-02-28 2020-03-31 89.0
6 Covid 87 Florida 2020-03-31 2020-04-30 -2.0
7 Flu 90 Florida 2020-02-28 2020-03-31 90.0
Logic is same as the other answers but different way of representing.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With