Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas : how to keep rows in groupby where column value = min [duplicate]

Tags:

python

pandas

here's a sample of my dataset

side     |  serial_number   |   inspector  |   date_1        |    date_2
top      |       10         |   Paul       |   4/1/18 13:21  |    4/1/18 14:22
bot      |       10         |   Jack       |   4/1/18 13:01  |    4/1/18 14:22
bot      |       11         |   Jack       |   4/1/18 14:01  |    4/1/18 14:53
top      |       11         |   Paul       |   4/1/18 14:25  |    4/1/18 14:53
top      |       12         |   Henry      |   4/1/18 14:25  |    4/1/18 14:58

For each unique tuple (serial_number, date_2), I want to keep the row where date_1 is minimum and keep every column, so that eventually my dataset looks like this :

side     |  serial_number   |   inspector  |   date_1        |    date_2
bot      |       10         |   Jack       |   4/1/18 13:01  |    4/1/18 14:22
bot      |       11         |   Jack       |   4/1/18 14:01  |    4/1/18 14:53
top      |       12         |   Henry      |   4/1/18 14:25  |    4/1/18 14:58

To do so, my current code looks like this :

import pandas as pd

df = pd.read_csv("data.csv") #getting the data in a pandas dataframe
df_sorted = df.groupby(['serial_number','date_2'], sort=False)['date_1'].min()
df_sorted .to_csv("data_sorted.csv")

So in the end, I got the right dataset but columns I'm not grouping by are missing. Here's the resulting dataset :

 serial_number   |     date_1        |    date_2
      10         |     4/1/18 13:01  |    4/1/18 14:22
      11         |     4/1/18 14:01  |    4/1/18 14:53
      12         |     4/1/18 14:25  |    4/1/18 14:58

How do I keep all columns ? Thank you.

like image 432
Jdoe Avatar asked Oct 19 '25 09:10

Jdoe


1 Answers

Instead of calling min, after your groupby, which returns the minimum value for each group, instead use idxmin, which returns the index value where the minimum occurs in each group:

df.groupby(['serial_number','date_2'])['date_1'].idxmin()

# serial_number  date_2             
# 10             2018-04-01 14:22:00    1
# 11             2018-04-01 14:53:00    2
# 12             2018-04-01 14:58:00    4

You can then use these indices with iloc to select the complete rows in your dataframe where the minimum for each group occurs:

df.iloc[df.groupby(['serial_number','date_2'])['date_1'].idxmin()]

#   side        serial_number inspector                    date_1  \
# 1  bot                   10     Jack        2018-04-01 13:01:00   
# 2  bot                   11     Jack        2018-04-01 14:01:00   
# 4  top                   12     Henry       2018-04-01 14:25:00   
# 
#                date_2  
# 1 2018-04-01 14:22:00  
# 2 2018-04-01 14:53:00  
# 4 2018-04-01 14:58:00  
like image 122
cmaher Avatar answered Oct 21 '25 22:10

cmaher



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!