pandas: GroupBy .pipe() vs .apply()

Tags:

In the example from the pandas documentation about the new .pipe() method for GroupBy objects, an .apply() method accepting the same lambda would return the same results.

In [195]: import numpy as np  In [196]: n = 1000  In [197]: df = pd.DataFrame({'Store': np.random.choice(['Store_1', 'Store_2'], n),    .....:                    'Product': np.random.choice(['Product_1', 'Product_2', 'Product_3'], n),    .....:                    'Revenue': (np.random.random(n)*50+10).round(2),    .....:                    'Quantity': np.random.randint(1, 10, size=n)})  In [199]: (df.groupby(['Store', 'Product'])    .....:    .pipe(lambda grp: grp.Revenue.sum()/grp.Quantity.sum())    .....:    .unstack().round(2))  Out[199]:  Product  Product_1  Product_2  Product_3 Store                                    Store_1       6.93       6.82       7.15 Store_2       6.69       6.64       6.77

I can see how the pipe functionality differs from apply for DataFrame objects, but not for GroupBy objects. Does anyone have an explanation or examples of what can be done with pipe but not with apply for a GroupBy?

276

asked Nov 10 '17 15:11

foglerit

1 Answers

What pipe does is to allow you to pass a callable with the expectation that the object that called pipe is the object that gets passed to the callable.

With apply we assume that the object that calls apply has subcomponents that will each get passed to the callable that was passed to apply. In the context of a groupby the subcomponents are slices of the dataframe that called groupby where each slice is a dataframe itself. This is analogous for a series groupby.

The main difference between what you can do with a pipe in a groupby context is that you have available to the callable the entire scope of the the groupby object. For apply, you only know about the local slice.

Setup
Consider df

df = pd.DataFrame(dict(     A=list('XXXXYYYYYY'),     B=range(10) ))     A  B 0  X  0 1  X  1 2  X  2 3  X  3 4  Y  4 5  Y  5 6  Y  6 7  Y  7 8  Y  8 9  Y  9

Example 1
Make the entire 'B' column sum to 1 while each sub-group sums to the same amount. This requires that the calculation be aware of how many groups exist. This is something we can't do with apply because apply wouldn't know how many groups exist.

s = df.groupby('A').B.pipe(lambda g: df.B / g.transform('sum') / g.ngroups) s  0    0.000000 1    0.083333 2    0.166667 3    0.250000 4    0.051282 5    0.064103 6    0.076923 7    0.089744 8    0.102564 9    0.115385 Name: B, dtype: float64

Note:

s.sum()  0.99999999999999989

And:

s.groupby(df.A).sum()  A X    0.5 Y    0.5 Name: B, dtype: float64

Example 2
Subtract the mean of one group from the values of another. Again, this can't be done with apply because apply doesn't know about other groups.

df.groupby('A').B.pipe(     lambda g: (         g.get_group('X') - g.get_group('Y').mean()     ).append(         g.get_group('Y') - g.get_group('X').mean()     ) )  0   -6.5 1   -5.5 2   -4.5 3   -3.5 4    2.5 5    3.5 6    4.5 7    5.5 8    6.5 9    7.5 Name: B, dtype: float64

answered Sep 18 '22 04:09

piRSquared

Related questions
                            
                                Python 3.6.1 crashed after readline module installed
                            
                                How do you embed album art into an MP3 using Python?
                            
                                How do I download a zip file in python using urllib2?
                            
                                Updating a Haystack search index with Django + Celery
                            
                                Loop over 2 lists, repeating the shortest until end of longest
                            
                                Using @functools.lru_cache with dictionary arguments
                            
                                Import error on installed package using setup.py
                            
                                Python doctest with newline characters: inconsistent leading whitespace error
                            
                                Python: Divide each row of a DataFrame by another DataFrame vector
                            
                                Why does the asyncio's event loop suppress the KeyboardInterrupt on Windows?
                            
                                pandas replace zeros with previous non zero value
                            
                                List minimum in Python with None?
                            
                                How would you zip an unknown number of lists in Python?
                            
                                How to delete a symbolic link in python?
                            
                                How do I compute the derivative of an array in python
                            
                                Using struct pack in python
                            
                                How to automatically reload Django when files change?
                            
                                Drawing rectangle with border only in matplotlib
                            
                                How to detect if code is python 3 compatible
                            
                                Tensorflow VarLenFeature vs FixedLenFeature

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

pandas: GroupBy .pipe() vs .apply()

Tags:

python

python-3.x

pandas

pandas-groupby

foglerit

People also ask

1 Answers

piRSquared

Recent Activity

Donate For Us