I have a Pandas DataFrame containing several categorical variables. For example:
import pandas as pd
d = {'grade':['A','B','C','A','B'], 
    'year':['2013','2013','2013','2012','2012']}
df = pd.DataFrame(d)

I would like to transform this to a MultiIndex DataFrame with the following properties:
For example:

Could anyone suggest a method for creating this MultiIndex DataFrame?
Another way you can do this to use melt and groupby:
df_out = df.melt().groupby(['variable','value']).size().to_frame(name='n')
df_out['proportion'] = df_out['n'].div(df_out.n.sum(level=0),level=0)
print(df_out)
Output:
                n  proportion
variable value               
grade    A      2         0.4
         B      2         0.4
         C      1         0.2
year     2012   2         0.4
         2013   3         0.6
And, if you really want to get crazy and do it in a one-liner:
(df.melt().groupby(['variable','value']).size().to_frame(name='n')
  .pipe(lambda x: x.assign(proportion = x[['n']]/x.groupby(level=0).transform('sum'))))
Upgraded solution using @Wen pct calculation:
(df.melt().groupby(['variable','value']).size().to_frame(name='n')
  .pipe(lambda x: x.assign(proportion = x['n'].div(x.n.sum(level=0),level=0))))
You can try this ..
df1=df.apply(pd.value_counts).stack().swaplevel(0,1).to_frame('n')
df1['pct']=df1['n'].div(df1.n.sum(level=0),level=0)
df1
Out[89]: 
              n  pct
year  2012  2.0  0.4
      2013  3.0  0.6
grade A     2.0  0.4
      B     2.0  0.4
      C     1.0  0.2
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With