Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sort pandas multi-index by count?

I have a dataframe, generated by a multi-index operation, that looks like this.

                      Col3
Col1        Col2
A              A1         N
B              B1         N
               B2         N
C              C1         N
               C2         N
               C3         N 

I'm trying to sort this dataframe by the descending count of rows at level A, B, and C. In this case, level A has 1 row, B has 2 rows, and C has 3 rows...so the output would be

                      Col3
Col1        Col2
C              C1         N
               C2         N
               C3         N
B              B1         N
               B2         N
A              A1         N 

I can think of doing this manually by actually counting the number of rows at each level and adding a column to the df to sort by, but is there a more elegant way? if so, is there a way to generalize to more levels?

Thank you!

EDIT: Code to generate original dataframe.

df = pd.DataFrame([['a', 'z', 'x', 0.123], ['a','z', 'x', 0.234],
                ['a', 'z', 'y', 0.451], ['b', 'z', 'x', 0.453], ['b', 'z', 'x', 0.453], ['b', 'z', 'x', 0.453], ['b', 'z', 'x', 0.453]],
               columns=['first', 'second', 'value1', 'value2']
               ).set_index(['first', 'second'])

Running df.ix[df.groupby(level=0).size().sort_values(ascending=False).index,:] produces TypeError: Expected tuple, got str

like image 990
Jack Florey Avatar asked Sep 05 '25 03:09

Jack Florey


1 Answers

IIUC you can do it by groupby on the first index level and sort the size and use this to reindex into your df:

In [25]:
df.ix[df.groupby(level=0).size().sort_values(ascending=False).index,:]

Out[25]:
          Col3
Col1 Col2     
C    C1      N
     C2      N
     C3      N
B    B1      N
     B2      N
A    A1      N

breaking the above down:

In [26]:
df.groupby(level=0).size()

Out[26]:
Col1
A    1
B    2
C    3
dtype: int64

In [27]:
df.groupby(level=0).size().sort_values(ascending=False)

Out[27]:
Col1
C    3
B    2
A    1
dtype: int64

In [28]:
df.groupby(level=0).size().sort_values(ascending=False).index

Out[28]:
Index(['C', 'B', 'A'], dtype='object', name='Col1')

EDIT

OK this was trickier than I expected but the following works:

In [76]:
i = df.index.get_level_values(0)
df.iloc[i.reindex(df.groupby(level=0).size().sort_values(ascending=False).index)[1]]

Out[76]:
             value1  value2
first second               
b     z           x   0.453
      z           x   0.453
      z           x   0.453
      z           x   0.453
a     z           x   0.123
      z           x   0.234
      z           y   0.451

So what this does is gets the first level index values and then reindexes them against the groupby result and uses the int index generated against the original df

like image 81
EdChum Avatar answered Sep 07 '25 21:09

EdChum