Transform Pandas DataFrame of categorical variables to MultiIndex with count and proportion

Question

I have a Pandas DataFrame containing several categorical variables. For example:

import pandas as pd

d = {'grade':['A','B','C','A','B'], 
    'year':['2013','2013','2013','2012','2012']}

df = pd.DataFrame(d)

enter image description here

I would like to transform this to a MultiIndex DataFrame with the following properties:

First level index is the variable name (e.g. 'grade')
Second level index is the levels within the variable (e.g. 'A', 'B', 'C')
One column contains 'n', a count of the number of times the level appears
A second column contains 'proportion', the proportion represented by this level.

For example:

enter image description here

Could anyone suggest a method for creating this MultiIndex DataFrame?

Scott Boston · Accepted Answer

Another way you can do this to use melt and groupby:

df_out = df.melt().groupby(['variable','value']).size().to_frame(name='n')
df_out['proportion'] = df_out['n'].div(df_out.n.sum(level=0),level=0)
print(df_out)

Output:

                n  proportion
variable value               
grade    A      2         0.4
         B      2         0.4
         C      1         0.2
year     2012   2         0.4
         2013   3         0.6

And, if you really want to get crazy and do it in a one-liner:

(df.melt().groupby(['variable','value']).size().to_frame(name='n')
  .pipe(lambda x: x.assign(proportion = x[['n']]/x.groupby(level=0).transform('sum'))))

Upgraded solution using @Wen pct calculation:

(df.melt().groupby(['variable','value']).size().to_frame(name='n')
  .pipe(lambda x: x.assign(proportion = x['n'].div(x.n.sum(level=0),level=0))))

BENY · Answer

You can try this ..

df1=df.apply(pd.value_counts).stack().swaplevel(0,1).to_frame('n')
df1['pct']=df1['n'].div(df1.n.sum(level=0),level=0)
df1
Out[89]: 
              n  pct
year  2012  2.0  0.4
      2013  3.0  0.6
grade A     2.0  0.4
      B     2.0  0.4
      C     1.0  0.2

Transform Pandas DataFrame of categorical variables to MultiIndex with count and proportion

Tags:

python

pandas

dataframe

categorical-data

tomp

2 Answers

Scott Boston

BENY

Recent Activity

Donate For Us

Transform Pandas DataFrame of categorical variables to MultiIndex with count and proportion

Tags:

python

pandas

dataframe

categorical-data

tomp

2 Answers

Scott Boston

BENY

Related questions

Recent Activity

Donate For Us