In Pandas 0.17 I try to sort by a specific column while maintaining the hierarchical index (A and B). B is a running number created when setting up the dataframe through concatenation. My data looks like this:
          C      D
A   B
bar one   shiny  10
    two   dull   5
    three glossy 8
foo one   dull   3
    two   shiny  9
    three matt   12
This is what I need:
          C      D
A   B
bar two   dull   5
    three glossy 8
    one   shiny  10
foo one   dull   3
    three matt   12
    two   shiny  9
Below is the code I am using and the result. Note: Pandas 0.17 alerts that dataframe.sort will be deprecated.
df.sort_values(by="C", ascending=True)
          C      D
A   B
bar two   dull   5
foo one   dull   3
bar three glossy 8
foo three matt   12
bar one   shiny  10
foo two   shiny  9
Adding .groupby produces the same result:
df.sort_values(by="C", ascending=True).groupby(axis=0, level=0, as_index=True)
Similarly, switching to sorting indices first, and then groupby the column is not fruitful:
df.sort_index(axis=0, level=0, as_index=True).groupby(C, as_index=True)
I am not certain about reindexing I need to keep the first index A, second index B can be reassigned, but does not have to. It would surprise me if there is not an easy solution; I guess I just don't find it. Any suggestions are appreciated.
Edit: In the meantime I dropped the second index B, reassigned first index A to be a column instead of an index sorted multiple columns, then re-indexed it:
df.index = df.index.droplevel(1)
df.reset_index(level=0, inplace=True)
df_sorted = df.sort_values(["A", "C"], ascending=[1,1]) #A is a column here, not an index.
df_reindexed = df_sorted.set_index("A")
Still very verbose.
You can sort pandas DataFrame by one or multiple (one or more) columns using sort_values() method and by ascending or descending order. To specify the order, you have to use ascending boolean property; False for descending and True for ascending.
To sort the DataFrame based on the values in a single column, you'll use . sort_values() . By default, this will return a new DataFrame sorted in ascending order. It does not modify the original DataFrame.
However, to sort MultiIndex at a specific level, use the multiIndex. sortlevel() method in Pandas. Set the level as an argument. To sort in descending order, use the ascending parameter and set to False.
To rearrange levels in MultiIndex, use the MultiIndex. reorder_levels() method in Pandas. Set the order of levels using the order parameter.
Feels like there could be a better way, but here's one approach:
In [163]: def sorter(sub_df):
     ...:     sub_df = sub_df.sort_values('C')
     ...:     sub_df.index = sub_df.index.droplevel(0)
     ...:     return sub_df
In [164]: df.groupby(level='A').apply(sorter)
Out[164]: 
                C   D
A   B                
bar two      dull   5
    three  glossy   8
    one     shiny  10
foo one      dull   3
    three    matt  12
    two     shiny   9
Based on chrisb's code:
Note that in my case, it's a Series not a DataFrame,
s.groupby(level='A', group_keys=False).apply(lambda x: x.sort_values(ascending=False))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With