Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Subtract successive rows in a dataframe grouped by id in pandas(Python)

I have the following data frame:

id        day           total_amount
 1       2015-07-09         1000
 1       2015-10-22          100
 1       2015-11-12          200
 1       2015-11-27         2392
 1       2015-12-16          123
 7       2015-07-09          200
 7       2015-07-09         1000
 7       2015-08-27       100018
 7       2015-11-25         1000
 8       2015-08-27         1000
 8       2015-12-07        10000
 8       2016-01-18          796
 8       2016-03-31        10000
15       2015-09-10         1500
15       2015-09-30         1000

I need to subtract every two successive time in day column if they have the same id until reaching the last row of that id then start subtracting times in day column this time for new id, something similar to following lines in output is expected:

 1       2015-08-09         1000 2015-11-22 - 2015-08-09
 1       2015-11-22          100 2015-12-12 - 2015-11-22
 1       2015-12-12          200 2015-12-16 - 2015-12-12
 1       2015-12-16         2392 2015-12-27 - 2015-12-27
 1       2015-12-27          123         NA
 7       2015-08-09          200 2015-09-09 - 2015-08-09
 7       2015-09-09         1000 2015-09-27 - 2015-09-09
 7       2015-09-27       100018 2015-12-25 - 2015-09-27
 7       2015-12-25         1000         NA
 8       2015-08-27         1000  2015-12-07 - 2015-08-27
 8       2015-12-07        10000  2016-02-18 - 2015-12-07
 8       2016-02-18          796   2016-04-31- 2016-02-18     
 8       2016-04-31        10000         NA
15       2015-10-10         1500  2015-10-30 - 2015-10-10
15       2015-10-30         1000         NA
like image 851
chessosapiens Avatar asked Oct 26 '25 20:10

chessosapiens


1 Answers

You can use DataFrameGroupBy.diff:

df['dif'] = df.groupby('id')['day'].diff(-1) * (-1)
print (df)
    id        day  total_amount      dif
0    1 2015-07-09          1000 105 days
1    1 2015-10-22           100  21 days
2    1 2015-11-12           200  15 days
3    1 2015-11-27          2392  19 days
4    1 2015-12-16           123      NaT
5    7 2015-07-09           200   0 days
6    7 2015-07-09          1000  49 days
7    7 2015-08-27        100018  90 days
8    7 2015-11-25          1000      NaT
9    8 2015-08-27          1000 102 days
10   8 2015-12-07         10000  42 days
11   8 2016-01-18           796  73 days
12   8 2016-03-31         10000      NaT
13  15 2015-09-10          1500  20 days
14  15 2015-09-30          1000      NaT

Another solution with apply shift:

df['diff'] = df.groupby('id')['day'].apply(lambda x: x.shift(-1) - x)
print (df)
    id        day  total_amount     diff
0    1 2015-07-09          1000 105 days
1    1 2015-10-22           100  21 days
2    1 2015-11-12           200  15 days
3    1 2015-11-27          2392  19 days
4    1 2015-12-16           123      NaT
5    7 2015-07-09           200   0 days
6    7 2015-07-09          1000  49 days
7    7 2015-08-27        100018  90 days
8    7 2015-11-25          1000      NaT
9    8 2015-08-27          1000 102 days
10   8 2015-12-07         10000  42 days
11   8 2016-01-18           796  73 days
12   8 2016-03-31         10000      NaT
13  15 2015-09-10          1500  20 days
14  15 2015-09-30          1000      NaT

EDIT by comment:

If you need difference in hours as int, convert timedelta to hour:

df['diff'] = df.groupby('id')['day'].diff(-1) * (-1) / np.timedelta64(1, 'h')
print (df)
    id        day  total_amount    diff
0    1 2015-07-09          1000  2520.0
1    1 2015-10-22           100   504.0
2    1 2015-11-12           200   360.0
3    1 2015-11-27          2392   456.0
4    1 2015-12-16           123     NaN
5    7 2015-07-09           200     0.0
6    7 2015-07-09          1000  1176.0
7    7 2015-08-27        100018  2160.0
8    7 2015-11-25          1000     NaN
9    8 2015-08-27          1000  2448.0
10   8 2015-12-07         10000  1008.0
11   8 2016-01-18           796  1752.0
12   8 2016-03-31         10000     NaN
13  15 2015-09-10          1500   480.0
14  15 2015-09-30          1000     NaN
df['diff'] = df.groupby('id')['day'].apply(lambda x: x.shift(-1) - x) / 
                                     np.timedelta64(1, 'h')
print (df)
    id        day  total_amount    diff
0    1 2015-07-09          1000  2520.0
1    1 2015-10-22           100   504.0
2    1 2015-11-12           200   360.0
3    1 2015-11-27          2392   456.0
4    1 2015-12-16           123     NaN
5    7 2015-07-09           200     0.0
6    7 2015-07-09          1000  1176.0
7    7 2015-08-27        100018  2160.0
8    7 2015-11-25          1000     NaN
9    8 2015-08-27          1000  2448.0
10   8 2015-12-07         10000  1008.0
11   8 2016-01-18           796  1752.0
12   8 2016-03-31         10000     NaN
13  15 2015-09-10          1500   480.0
14  15 2015-09-30          1000     NaN
like image 102
jezrael Avatar answered Oct 29 '25 09:10

jezrael



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!