I have a df and I want to grab the most recent row below by CUSIP. <pre class="prettyprint"><code>In [374]: df.head() Out[374]: CUSIP COLA COLB COLC date 1992-05-08 AAA 238 4256 3.523346 1992-07-13 AAA NaN 4677 3.485577 1992-12-12 BBB 221 5150 3.24 1995-12-12 BBB 254 5150 3.25 1997-12-12 BBB 245 Nan 3.25 1998-12-12 CCC 234 5140 3.24145 1999-12-12 CCC 223 5120 3.65145 </code></pre> I am using: <pre class="prettyprint"><code>df = df.reset_index().groupby('CUSIP').last().reset_index.set_index('date') </code></pre> I want this: <pre class="prettyprint"><code> CUSIP COLA COLB COLC date 1992-07-13 AAA NaN 4677 3.485577 1997-12-12 BBB 245 Nan 3.25 1999-12-12 CCC 223 5120 3.65145 </code></pre> Instead I am getting: <pre class="prettyprint"><code> CUSIP COLA COLB COLC date 1992-07-13 AAA 238 4677 3.485577 1997-12-12 BBB 245 5150 3.25 1999-12-12 CCC 223 5120 3.65145 </code></pre> How do I get last() to take the last row of the groupby including the NaN's? Thank you.

You can do this directly with an apply instead of last (and get the -1th row of each group): <pre class="prettyprint"><code>In [11]: df.reset_index().groupby('CUSIP').apply(lambda x: x.iloc[-1]).reset_index(drop=True).set_index('date') Out[11]: CUSIP COLA COLB COLC date 1992-07-13 AAA NaN 4677 3.485577 1997-12-12 BBB 245 NaN 3.250000 1999-12-12 CCC 223 5120 3.651450 [3 rows x 4 columns] </code></pre> In 0.13 (rc out now), a faster and more direct way will be to use cumcount: <pre class="prettyprint"><code>In [12]: df[df.groupby('CUSIP').cumcount(ascending=False) == 0] Out[12]: CUSIP COLA COLB COLC date 1992-07-13 AAA NaN 4677 3.485577 1997-12-12 BBB 245 NaN 3.250000 1999-12-12 CCC 223 5120 3.651450 [3 rows x 4 columns] </code></pre>

Groupby - taking last element - how do I keep nan's?

Tags:

python

pandas

I have a df and I want to grab the most recent row below by CUSIP.

In [374]: df.head()
Out[374]: 
              CUSIP        COLA         COLB       COLC  
date                                                          
1992-05-08    AAA          238         4256      3.523346   
1992-07-13    AAA          NaN         4677      3.485577   
1992-12-12    BBB          221         5150      3.24
1995-12-12    BBB          254         5150      3.25
1997-12-12    BBB          245         Nan       3.25
1998-12-12    CCC          234         5140      3.24145
1999-12-12    CCC          223         5120      3.65145

I am using:

df = df.reset_index().groupby('CUSIP').last().reset_index.set_index('date')

I want this:

              CUSIP        COLA         COLB       COLC  
date           
1992-07-13    AAA          NaN         4677      3.485577      
1997-12-12    BBB          245         Nan       3.25
1999-12-12    CCC          223         5120      3.65145

Instead I am getting:

              CUSIP        COLA         COLB       COLC  
date           
1992-07-13    AAA          238         4677      3.485577      
1997-12-12    BBB          245         5150       3.25
1999-12-12    CCC          223         5120      3.65145

How do I get last() to take the last row of the groupby including the NaN's?

Thank you.

492

asked Dec 17 '13 20:12

user1911092

1 Answers

You can do this directly with an apply instead of last (and get the -1th row of each group):

In [11]: df.reset_index().groupby('CUSIP').apply(lambda x: x.iloc[-1]).reset_index(drop=True).set_index('date')
Out[11]: 
           CUSIP  COLA  COLB      COLC
date                                  
1992-07-13   AAA   NaN  4677  3.485577
1997-12-12   BBB   245   NaN  3.250000
1999-12-12   CCC   223  5120  3.651450

[3 rows x 4 columns]

In 0.13 (rc out now), a faster and more direct way will be to use cumcount:

In [12]: df[df.groupby('CUSIP').cumcount(ascending=False) == 0]
Out[12]: 
           CUSIP  COLA  COLB      COLC
date                                  
1992-07-13   AAA   NaN  4677  3.485577
1997-12-12   BBB   245   NaN  3.250000
1999-12-12   CCC   223  5120  3.651450

[3 rows x 4 columns]

120

answered Sep 30 '22 07:09

Andy Hayden

Related questions
                            
                                Optional variables in Python
                            
                                Is there a shortcut for self.assertNotEqual() in nose?
                            
                                Does not work autocomplete with EdgeNgramField using haystack and engine Elasticsearch (Django)
                            
                                How to compute a double precision float score from the first 8 bytes of a string in Python?
                            
                                How to perform search on a list of tuples
                            
                                Two separate processes sharing the same Camera feed OpenCv
                            
                                Match numbers not preceded by string
                            
                                SQLAlchemy delete association objects
                            
                                Python: How to use variables across modules
                            
                                How do you pass around a void pointer between Python and C when writing an extension?
                            
                                How to return a value from Python script as a Bash variable?
                            
                                How do I use Sprite Sheets in Pygame?
                            
                                exec doesn't pick up variables from closure
                            
                                Why does this element in lxml include the tail?
                            
                                Persistent multiprocess shared cache in Python with stdlib or minimal dependencies
                            
                                How do I get string representation of PyObject in Python3?
                            
                                HTTPS server with Python
                            
                                uWSGI runs wrong version of Python
                            
                                Error when plotting DataFrame containing NaN with Pandas 0.12.0 and Matplotlib 1.3.1 on Python 3.3.2
                            
                                Extend line to smoothly connect with another line

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With