I have a df and I want to grab the most recent row below by CUSIP.
In [374]: df.head()
Out[374]: 
              CUSIP        COLA         COLB       COLC  
date                                                          
1992-05-08    AAA          238         4256      3.523346   
1992-07-13    AAA          NaN         4677      3.485577   
1992-12-12    BBB          221         5150      3.24
1995-12-12    BBB          254         5150      3.25
1997-12-12    BBB          245         Nan       3.25
1998-12-12    CCC          234         5140      3.24145
1999-12-12    CCC          223         5120      3.65145
I am using:
df = df.reset_index().groupby('CUSIP').last().reset_index.set_index('date')
I want this:
              CUSIP        COLA         COLB       COLC  
date           
1992-07-13    AAA          NaN         4677      3.485577      
1997-12-12    BBB          245         Nan       3.25
1999-12-12    CCC          223         5120      3.65145
Instead I am getting:
              CUSIP        COLA         COLB       COLC  
date           
1992-07-13    AAA          238         4677      3.485577      
1997-12-12    BBB          245         5150       3.25
1999-12-12    CCC          223         5120      3.65145
How do I get last() to take the last row of the groupby including the NaN's?
Thank you.
To get the last row of each group, call last() after grouping.
How to get the last value in each group? You can use the pandas. groupby. last() function to get the last value in each group.
From the docs: "NA groups in GroupBy are automatically excluded".
Grouping a Series by a Series Instead, it's a SeriesGroupBy object. A SeriesGroupBy consists of groups , one for each of the distinct values of the Party column. If we ask to see these groups, we'll be able to see which indices in the original DataFrame correspond to each group.
You can do this directly with an apply instead of last (and get the -1th row of each group):
In [11]: df.reset_index().groupby('CUSIP').apply(lambda x: x.iloc[-1]).reset_index(drop=True).set_index('date')
Out[11]: 
           CUSIP  COLA  COLB      COLC
date                                  
1992-07-13   AAA   NaN  4677  3.485577
1997-12-12   BBB   245   NaN  3.250000
1999-12-12   CCC   223  5120  3.651450
[3 rows x 4 columns]
In 0.13 (rc out now), a faster and more direct way will be to use cumcount:
In [12]: df[df.groupby('CUSIP').cumcount(ascending=False) == 0]
Out[12]: 
           CUSIP  COLA  COLB      COLC
date                                  
1992-07-13   AAA   NaN  4677  3.485577
1997-12-12   BBB   245   NaN  3.250000
1999-12-12   CCC   223  5120  3.651450
[3 rows x 4 columns]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With