I have a Pandas DataFrame which contains tracks, score and some other columns.
I want to group by on "tracks" and then sorting of those groups based on the maximum value on the "score".
Example DataFrame:
tracks score
20 2.2
20 1.5
25 3.5
24 1.2
24 5.5
Expected Output (I want to compare the highest values from each group and sort all the group from highest to lowest, though I don't want to lose any other data - means I want to show all the rows):
tracks score
24 5.5
1.2
25 3.5
20 2.2
1.5
Currently, I am getting following output (my score is sorted but after group by my tracks are sorted based on the track numbers):
tracks score
20 2.2
1.5
24 5.5
4.2
25 3.5
My Approach till now: 1. I've sorted all the values by the score
sub_df = sub_df.sort_values("score")
url_dict = sub_df.groupby('track')['url'].apply(list).to_dict()
I also tried to used OrderedDict but it is of no use (at least for now) as the groupBy command is sending the wrong sequence of data.
Pandas = 0.23, Python = 3.6.4
Create helper column by GroupBy.transform and sort by multiple columns by DataFrame.sort_values, last remove helper column:
sub_df['max'] = sub_df.groupby('tracks')['score'].transform('max')
sub_df = sub_df.sort_values(["max","score"], ascending=False).drop('max', axis=1)
#if necessary sorting also by tracks column
#sub_df = sub_df.sort_values(["max","tracks","score"], ascending=False).drop('max', axis=1)
print (sub_df)
tracks score
4 24 5.5
3 24 1.2
2 25 3.5
0 20 2.2
1 20 1.5
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With