Sort pandas multi-index by count?

Question

I have a dataframe, generated by a multi-index operation, that looks like this.

                      Col3
Col1        Col2
A              A1         N
B              B1         N
               B2         N
C              C1         N
               C2         N
               C3         N

I'm trying to sort this dataframe by the descending count of rows at level A, B, and C. In this case, level A has 1 row, B has 2 rows, and C has 3 rows...so the output would be

                      Col3
Col1        Col2
C              C1         N
               C2         N
               C3         N
B              B1         N
               B2         N
A              A1         N

I can think of doing this manually by actually counting the number of rows at each level and adding a column to the df to sort by, but is there a more elegant way? if so, is there a way to generalize to more levels?

Thank you!

EDIT: Code to generate original dataframe.

df = pd.DataFrame([['a', 'z', 'x', 0.123], ['a','z', 'x', 0.234],
                ['a', 'z', 'y', 0.451], ['b', 'z', 'x', 0.453], ['b', 'z', 'x', 0.453], ['b', 'z', 'x', 0.453], ['b', 'z', 'x', 0.453]],
               columns=['first', 'second', 'value1', 'value2']
               ).set_index(['first', 'second'])

Running df.ix[df.groupby(level=0).size().sort_values(ascending=False).index,:] produces TypeError: Expected tuple, got str

EdChum · Accepted Answer

IIUC you can do it by groupby on the first index level and sort the size and use this to reindex into your df:

In [25]:
df.ix[df.groupby(level=0).size().sort_values(ascending=False).index,:]

Out[25]:
          Col3
Col1 Col2     
C    C1      N
     C2      N
     C3      N
B    B1      N
     B2      N
A    A1      N

breaking the above down:

In [26]:
df.groupby(level=0).size()

Out[26]:
Col1
A    1
B    2
C    3
dtype: int64

In [27]:
df.groupby(level=0).size().sort_values(ascending=False)

Out[27]:
Col1
C    3
B    2
A    1
dtype: int64

In [28]:
df.groupby(level=0).size().sort_values(ascending=False).index

Out[28]:
Index(['C', 'B', 'A'], dtype='object', name='Col1')

EDIT

OK this was trickier than I expected but the following works:

In [76]:
i = df.index.get_level_values(0)
df.iloc[i.reindex(df.groupby(level=0).size().sort_values(ascending=False).index)[1]]

Out[76]:
             value1  value2
first second               
b     z           x   0.453
      z           x   0.453
      z           x   0.453
      z           x   0.453
a     z           x   0.123
      z           x   0.234
      z           y   0.451

So what this does is gets the first level index values and then reindexes them against the groupby result and uses the int index generated against the original df

Sort pandas multi-index by count?

Tags:

python

sorting

pandas

Jack Florey

1 Answers

EdChum

Recent Activity

Donate For Us

Sort pandas multi-index by count?

Tags:

python

sorting

pandas

Jack Florey

1 Answers

EdChum

Related questions

Recent Activity

Donate For Us