I have a df as shown below.
df:
Country Player
Arg Messi
Bra Neymar
Arg NaN
Arg Messi
Arg Aguero
Arg Messi
Bra Ronaldo
Spain Xavi
Spain NaN
Spain NaN
Bra Rivaldo
Spain Iniesta
Bra NaN
Spain Xavi
Where NaN stands for information not available. From the above df, I would like to perform multiple groupby counts as shown below.
Expected output:
Country Player Counts Percentage_of_country
Arg NaN 1 20
Arg Messi 3 60
Arg Aguero 1 20
Bra Neymar 1 25
Bra NaN 1 25
Bra Ronaldo 1 25
Bra Rivaldo 1 25
Spain NaN 2 40
Spain Xavi 2 40
Spain Iniesta 1 20
I tried below code:
df2 = df.groupby(['Country', 'Player']).size().reset_index(name='counts')
df2['prcntg'] = df2['counts']/df2.groupby('Country')['counts'].transform('sum')
df2
Another way to do it, really producing all results in a single groupby is as follows:
Define a helper function to calculate the percentage:
with dropna=False to keep the NaN values:
f = lambda x: x.size / df.groupby('Country', dropna=False).size()[x.iloc[0]] * 100
The first size function returns counts under the group of ['Country', 'Player'], while the second size function, which is grouped under Country only, returns the counts under the bigger group.
Then, make use of the named aggregation of DataFrameGroupBy.aggregate():
(df.groupby(['Country', 'Player'], dropna=False)
.agg(counts=('Player', 'size'),
prcntg=('Country', f))
)
Result:
counts prcntg
Country Player
Arg Aguero 1 20.0
Messi 3 60.0
NaN 1 20.0
Bra Neymar 1 25.0
Rivaldo 1 25.0
Ronaldo 1 25.0
NaN 1 25.0
Spain Iniesta 1 20.0
Xavi 2 40.0
NaN 2 40.0
If you got error TypeError: groupby() got an unexpected keyword argument 'dropna', probably your Pandas version is older than version 1.1.0. This dropna parameter, which allows you keeping the NaN counts, is supported since this version. Probably you should consider upgrading your Pandas for richer sets of Pandas features.
If you cannot upgrade at the moment, a workaround solution is to replace NaN in the Player column by some other text, eg. string '_NaN' or some special word before the grouping. You can restore its values after the grouping if you need to. Sample codes below:
import numpy as np
df['Player'] = df['Player'].fillna('_NaN') # Set `NaN` values to string `_NaN`
# Main processing with all results produced in a single `groupby`:
f = lambda x: x.size / df.groupby('Country').size()[x.iloc[0]] * 100
df_out = (df.groupby(['Country', 'Player'], as_index=False)
.agg(counts=('Player', 'size'),
prcntg=('Country', f))
)
df_out['Player'] = df_out['Player'].replace('_NaN', np.nan) # restore `NaN` values
Result:
print(df_out)
Country Player counts prcntg
0 Arg Aguero 1 20.0
1 Arg Messi 3 60.0
2 Arg NaN 1 20.0
3 Bra Neymar 1 25.0
4 Bra Rivaldo 1 25.0
5 Bra Ronaldo 1 25.0
6 Bra NaN 1 25.0
7 Spain Iniesta 1 20.0
8 Spain Xavi 2 40.0
9 Spain NaN 2 40.0
First group the dataframe by Country, and Player, then call size for the count, and call to_frame passing the column name to create dataframe out of it. You also need to pass dropna=True since you want to include NaN.
After that, you can group the count by level=0 then call tranform to get the sum for the levels, and divide the count by this value. You can call reset_index at last if needed.
count=df.groupby(['Country', 'Player'], dropna=False).size().to_frame('Counts')
count['Percentage_of_country']=100*count/count.groupby(level=0).transform('sum')
OUTPUT:
Counts Percentage_of_country
Country Player
Arg Aguero 1 20.0
Messi 3 60.0
NaN 1 20.0
Bra Neymar 1 25.0
Rivaldo 1 25.0
Ronaldo 1 25.0
NaN 1 25.0
Spain Iniesta 1 20.0
Xavi 2 40.0
NaN 2 40.0
The dropna parameter was introduced in pandas version 1.1.0, so if you are using a version older than that, you can first try to replace the NaN value to something else, then revert back to NaN after performing the required operation.
df['Player'] = df['Player'].fillna('#!Missing!#') #replace NaN by #!Missing!#'
count=df.groupby(['Country', 'Player']).size().to_frame('Counts')
count['Percentage_of_country']=100*count/count.groupby(level=0).transform('sum')
count.reset_index(inplace=True)
count['Player'] = count['Player'].replace({'#!Missing!#':float('nan')})
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With