Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas - Groupby dataframe store as dataframe without aggregating

I'm new to Pandas and I've read a lot of documentation, posts and answers here but I've been unable to discern a good strategy to approach my goal, sorry if its already answered, I couldn't find it. Here is what I have:

df = {'key': ['A', 'B', 'A', 'B'], 'value': [2,2,1,1]}
df = pd.DataFrame(df)
df
   key  value
0   A   2
1   B   2
2   A   1
3   B   1

I know that doing groupby() would return a groupby object, and I know that I can do a lot of aggregating stuff (count, size, mean, etc) using the groupby object. However, I don't want to aggregate, I just want to groupby my dataframe based on 'key' column and store it as a dataframe like the following:

   key  value
0   A   2
1   A   1
2   B   2
3   B   1

Once I get this step done, what I eventually want is to order each group by value like the following:

   key  value
0   A   1
1   A   2
2   B   1
3   B   2

Any answer, comment, or hint is greatly appreciated. Thanks!

like image 793
Fred Avatar asked Oct 18 '25 10:10

Fred


2 Answers

You can get your desired output by sorting your dataframe with sort_values instead of doing a groupby.

df.sort_values(['key', 'value'], inplace=True)

Edit:

If you really want to use groupby to perform the grouping of the keys, so could apply a trivial filter to the groupby object.

df = df.groupby('key').filter(lambda x: True)

This doesn't seem like the best way to get a dataframe back, but nothing else immediately comes to mind. Afterwards you'd still need to use sort_values to order the values column.

like image 97
root Avatar answered Oct 20 '25 08:10

root


If the reason you want to use groupby is to preserve the index structure then you can do the following:

df = {'key': ['A', 'B', 'A', 'B'], 'value': [2,2,1,1]}
df = pd.DataFrame(df)
print(df) 

key  value
0   A      2
1   B      2
2   A      1
3   B      1

So, first create the index:

df.set_index(['key'], inplace=True)
print(df)

     value
key       
A        2
B        2
A        1
B        1

Then, sort the index:

df.sort_index(inplace=True)
print(df)

     value
key       
A        2
A        1
B        2
B        1

Then, sort the values:

df.sort_values('value',inplace=True)
print(df)

     value
key       
A        1
B        1
A        2
B        2

And if you want to preserve the original index, finally do:

df.reset_index(inplace=True)
print(df)

  key  value
0   A      1
1   B      1
2   A      2
3   B      2
like image 33
Santiago Avatar answered Oct 20 '25 06:10

Santiago



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!