Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Conditional shift: Subtract 'previous row value' from 'current row value' with multiple conditions in pandas

I have the following dataframe:

Disease     HeartRate   State    MonthStart   MonthEnd    
Covid       89          Texas    2020-02-28   2020-03-31      
Covid       91          Texas    2020-03-31   2020-04-30     
Covid       87          Texas    2020-07-31   2020-08-30      
Cancer      90          Texas    2020-02-28   2020-03-31 
Cancer      88          Florida  2020-03-31   2020-04-30      
Covid       89          Florida  2020-02-28   2020-03-31      
Covid       87          Florida  2020-03-31   2020-04-30      
Flu         90          Florida  2020-02-28   2020-03-31        

I have to subtract ‘previous row’ from the ‘current row’ in the ‘Heart’ column and create a new one.

However, there are some conditions:

  1. Row values will be subtracted only when the 'Disease' and 'State' columns have the same values.
  2. Row values will be subtracted only when the rows are in consecutive month. If there is a break in timeline, values won't be subtracted.
  3. If there is no previous row values to subtract, then put the 'HeartRate' value only.

Desired output:

Disease     HeartRate   State    MonthStart   MonthEnd     HeartRateDiff
Covid       89          Texas    2020-02-28   2020-03-31    89      
Covid       91          Texas    2020-03-31   2020-04-30    2     
Covid       87          Texas    2020-07-31   2020-08-30    87      
Cancer      90          Texas    2020-02-28   2020-03-31    90 
Cancer      88          Florida  2020-03-31   2020-04-30    88          
Covid       89          Florida  2020-02-28   2020-03-31    89      
Covid       87          Florida  2020-03-31   2020-04-30    -2      
Flu         90          Florida  2020-02-28   2020-03-31    90      

I know how to subtract previous row from the current row using the following code:

df[‘DiffHeartRate’] = df.groupby(['Disease', 'State'])['HeartRate'].transform(pd.Series.diff)

However, I’m facing two problems:

  1. Keeping the same value if there is no previous row to subtract.
  2. Checking the continuity of timeline (next month or not).

Is there a smarter way of doing it? Any help would be appreciated. Thanks!

like image 460
Roy Avatar asked Oct 21 '25 00:10

Roy


1 Answers

You may try something like this:

df['DiffHeartRate']=(df.groupby(['Disease', 'State', 
          (df.MonthStart.dt.month.ne(df.MonthStart.dt.month.shift()+1)).cumsum()])['HeartRate']
 .apply(lambda x: x.diff())).fillna(df.HeartRate)

    Disease HeartRate   State   MonthStart  MonthEnd    DiffHeartRate
0   Covid   89          Texas   2020-02-28  2020-03-31  89.0
1   Covid   91          Texas   2020-03-31  2020-04-30  2.0
2   Covid   87          Texas   2020-07-31  2020-08-30  87.0
3   Cancer  90          Texas   2020-02-28  2020-03-31  90.0
4   Cancer  88          Florida 2020-03-31  2020-04-30  88.0
5   Covid   89          Florida 2020-02-28  2020-03-31  89.0
6   Covid   87          Florida 2020-03-31  2020-04-30  -2.0
7   Flu     90          Florida 2020-02-28  2020-03-31  90.0

Logic is same as the other answers but different way of representing.

like image 78
Pygirl Avatar answered Oct 22 '25 15:10

Pygirl