I have the following dataframe (df)
ID start end Diff
A 1/8/2020 12:00:00 AM 1/8/2020 12:00:10 AM 10
A 1/8/2020 12:00:05 AM 1/8/2020 12:00:10 AM 5
B 1/9/2020 1:00:05 AM 1/9/2020 1:00:10 AM 5
B 1/9/2020 1:00:06 AM 1/9/2020 1:00:10 AM 4
B 1/9/2020 1:00:20 AM 1/9/2020 1:00:25 AM 5
C 1/10/2020 5:00:05 AM 1/10/2020 5:00:25 AM 20
C 1/10/2020 5:00:40 AM 1/10/2020 5:00:45 AM 5
Here is the problem: When the end times are identical, I would like to delete the row with the longer duration, deleting the row with the shortest duration.
Desired outcome:
ID start end Diff
A 1/8/2020 12:00:05 AM 1/8/2020 12:00:10 AM 5
B 1/9/2020 1:00:06 AM 1/9/2020 1:00:10 AM 4
B 1/9/2020 1:00:20 AM 1/9/2020 1:00:25 AM 5
C 1/10/2020 5:00:05 AM 1/10/2020 5:00:25 AM 20
C 1/10/2020 5:00:40 AM 1/10/2020 5:00:45 AM 5
Essentially, when the end time is identical, I wish to remove the row that has the longer duration. I have tried this, however, it is not taking into account the condition: Retain the shorter duration row when the end time is identical
df.sort_values(['Diff']).drop_duplicates(subset=['ID'])
Any suggestion(s) is appreciated.
Use groupby on end column transformed with the minimum value of Diff , then compare with df['Diff'] and keep those which return True, check how transform returns the minimum over an entire group below:
df[df['Diff'].eq(df.groupby('end')['Diff'].transform('min'))]
ID start end Diff
1 A 1/8/2020 12:00:05 AM 1/8/2020 12:00:10 AM 5
3 B 1/9/2020 1:00:06 AM 1/9/2020 1:00:10 AM 4
4 B 1/9/2020 1:00:20 AM 1/9/2020 1:00:25 AM 5
5 C 1/10/2020 5:00:05 AM 1/10/2020 5:00:25 AM 20
6 C 1/10/2020 5:00:40 AM 1/10/2020 5:00:45 AM 5
Output of groupby+transform
print(df.groupby('end')['Diff'].transform('min'))
0 5
1 5
2 4
3 4
4 5
5 20
6 5
print(df['Diff'].eq(df.groupby('end')['Diff'].transform('min')))
0 False
1 True
2 False
3 True
4 True
5 True
6 True
We could use Series.map
df[df['Diff'].eq(df['end'].map(df.groupby('end')['Diff'].min()))]
ID start end Diff
1 A 1/8/2020 12:00:05 AM 1/8/2020 12:00:10 AM 5
3 B 1/9/2020 1:00:06 AM 1/9/2020 1:00:10 AM 4
4 B 1/9/2020 1:00:20 AM 1/9/2020 1:00:25 AM 5
5 C 1/10/2020 5:00:05 AM 1/10/2020 5:00:25 AM 20
6 C 1/10/2020 5:00:40 AM 1/10/2020 5:00:45 AM 5
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With