Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Transform Pandas DataFrame of categorical variables to MultiIndex with count and proportion

I have a Pandas DataFrame containing several categorical variables. For example:

import pandas as pd

d = {'grade':['A','B','C','A','B'], 
    'year':['2013','2013','2013','2012','2012']}

df = pd.DataFrame(d)

enter image description here

I would like to transform this to a MultiIndex DataFrame with the following properties:

  • First level index is the variable name (e.g. 'grade')
  • Second level index is the levels within the variable (e.g. 'A', 'B', 'C')
  • One column contains 'n', a count of the number of times the level appears
  • A second column contains 'proportion', the proportion represented by this level.

For example:

enter image description here

Could anyone suggest a method for creating this MultiIndex DataFrame?

like image 726
tomp Avatar asked Oct 29 '25 14:10

tomp


2 Answers

Another way you can do this to use melt and groupby:

df_out = df.melt().groupby(['variable','value']).size().to_frame(name='n')
df_out['proportion'] = df_out['n'].div(df_out.n.sum(level=0),level=0)
print(df_out)

Output:

                n  proportion
variable value               
grade    A      2         0.4
         B      2         0.4
         C      1         0.2
year     2012   2         0.4
         2013   3         0.6

And, if you really want to get crazy and do it in a one-liner:

(df.melt().groupby(['variable','value']).size().to_frame(name='n')
  .pipe(lambda x: x.assign(proportion = x[['n']]/x.groupby(level=0).transform('sum'))))

Upgraded solution using @Wen pct calculation:

(df.melt().groupby(['variable','value']).size().to_frame(name='n')
  .pipe(lambda x: x.assign(proportion = x['n'].div(x.n.sum(level=0),level=0))))
like image 111
Scott Boston Avatar answered Oct 31 '25 04:10

Scott Boston


You can try this ..

df1=df.apply(pd.value_counts).stack().swaplevel(0,1).to_frame('n')
df1['pct']=df1['n'].div(df1.n.sum(level=0),level=0)
df1
Out[89]: 
              n  pct
year  2012  2.0  0.4
      2013  3.0  0.6
grade A     2.0  0.4
      B     2.0  0.4
      C     1.0  0.2
like image 28
BENY Avatar answered Oct 31 '25 05:10

BENY



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!