I have the following data frame:
id day total_amount
1 2015-07-09 1000
1 2015-10-22 100
1 2015-11-12 200
1 2015-11-27 2392
1 2015-12-16 123
7 2015-07-09 200
7 2015-07-09 1000
7 2015-08-27 100018
7 2015-11-25 1000
8 2015-08-27 1000
8 2015-12-07 10000
8 2016-01-18 796
8 2016-03-31 10000
15 2015-09-10 1500
15 2015-09-30 1000
I need to subtract every two successive time in day column if they have the same id until reaching the last row of that id then start subtracting times in day column this time for new id, something similar to following lines in output is expected:
1 2015-08-09 1000 2015-11-22 - 2015-08-09
1 2015-11-22 100 2015-12-12 - 2015-11-22
1 2015-12-12 200 2015-12-16 - 2015-12-12
1 2015-12-16 2392 2015-12-27 - 2015-12-27
1 2015-12-27 123 NA
7 2015-08-09 200 2015-09-09 - 2015-08-09
7 2015-09-09 1000 2015-09-27 - 2015-09-09
7 2015-09-27 100018 2015-12-25 - 2015-09-27
7 2015-12-25 1000 NA
8 2015-08-27 1000 2015-12-07 - 2015-08-27
8 2015-12-07 10000 2016-02-18 - 2015-12-07
8 2016-02-18 796 2016-04-31- 2016-02-18
8 2016-04-31 10000 NA
15 2015-10-10 1500 2015-10-30 - 2015-10-10
15 2015-10-30 1000 NA
You can use DataFrameGroupBy.diff:
df['dif'] = df.groupby('id')['day'].diff(-1) * (-1)
print (df)
id day total_amount dif
0 1 2015-07-09 1000 105 days
1 1 2015-10-22 100 21 days
2 1 2015-11-12 200 15 days
3 1 2015-11-27 2392 19 days
4 1 2015-12-16 123 NaT
5 7 2015-07-09 200 0 days
6 7 2015-07-09 1000 49 days
7 7 2015-08-27 100018 90 days
8 7 2015-11-25 1000 NaT
9 8 2015-08-27 1000 102 days
10 8 2015-12-07 10000 42 days
11 8 2016-01-18 796 73 days
12 8 2016-03-31 10000 NaT
13 15 2015-09-10 1500 20 days
14 15 2015-09-30 1000 NaT
Another solution with apply shift:
df['diff'] = df.groupby('id')['day'].apply(lambda x: x.shift(-1) - x)
print (df)
id day total_amount diff
0 1 2015-07-09 1000 105 days
1 1 2015-10-22 100 21 days
2 1 2015-11-12 200 15 days
3 1 2015-11-27 2392 19 days
4 1 2015-12-16 123 NaT
5 7 2015-07-09 200 0 days
6 7 2015-07-09 1000 49 days
7 7 2015-08-27 100018 90 days
8 7 2015-11-25 1000 NaT
9 8 2015-08-27 1000 102 days
10 8 2015-12-07 10000 42 days
11 8 2016-01-18 796 73 days
12 8 2016-03-31 10000 NaT
13 15 2015-09-10 1500 20 days
14 15 2015-09-30 1000 NaT
EDIT by comment:
If you need difference in hours as int, convert timedelta to hour:
df['diff'] = df.groupby('id')['day'].diff(-1) * (-1) / np.timedelta64(1, 'h')
print (df)
id day total_amount diff
0 1 2015-07-09 1000 2520.0
1 1 2015-10-22 100 504.0
2 1 2015-11-12 200 360.0
3 1 2015-11-27 2392 456.0
4 1 2015-12-16 123 NaN
5 7 2015-07-09 200 0.0
6 7 2015-07-09 1000 1176.0
7 7 2015-08-27 100018 2160.0
8 7 2015-11-25 1000 NaN
9 8 2015-08-27 1000 2448.0
10 8 2015-12-07 10000 1008.0
11 8 2016-01-18 796 1752.0
12 8 2016-03-31 10000 NaN
13 15 2015-09-10 1500 480.0
14 15 2015-09-30 1000 NaN
df['diff'] = df.groupby('id')['day'].apply(lambda x: x.shift(-1) - x) /
np.timedelta64(1, 'h')
print (df)
id day total_amount diff
0 1 2015-07-09 1000 2520.0
1 1 2015-10-22 100 504.0
2 1 2015-11-12 200 360.0
3 1 2015-11-27 2392 456.0
4 1 2015-12-16 123 NaN
5 7 2015-07-09 200 0.0
6 7 2015-07-09 1000 1176.0
7 7 2015-08-27 100018 2160.0
8 7 2015-11-25 1000 NaN
9 8 2015-08-27 1000 2448.0
10 8 2015-12-07 10000 1008.0
11 8 2016-01-18 796 1752.0
12 8 2016-03-31 10000 NaN
13 15 2015-09-10 1500 480.0
14 15 2015-09-30 1000 NaN
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With