Pandas

Question

I'm new to Pandas and I've read a lot of documentation, posts and answers here but I've been unable to discern a good strategy to approach my goal, sorry if its already answered, I couldn't find it. Here is what I have:

df = {'key': ['A', 'B', 'A', 'B'], 'value': [2,2,1,1]}
df = pd.DataFrame(df)
df
   key  value
0   A   2
1   B   2
2   A   1
3   B   1

I know that doing groupby() would return a groupby object, and I know that I can do a lot of aggregating stuff (count, size, mean, etc) using the groupby object. However, I don't want to aggregate, I just want to groupby my dataframe based on 'key' column and store it as a dataframe like the following:

   key  value
0   A   2
1   A   1
2   B   2
3   B   1

Once I get this step done, what I eventually want is to order each group by value like the following:

   key  value
0   A   1
1   A   2
2   B   1
3   B   2

Any answer, comment, or hint is greatly appreciated. Thanks!

root · Accepted Answer

You can get your desired output by sorting your dataframe with sort_values instead of doing a groupby.

df.sort_values(['key', 'value'], inplace=True)

Edit:

If you really want to use groupby to perform the grouping of the keys, so could apply a trivial filter to the groupby object.

df = df.groupby('key').filter(lambda x: True)

This doesn't seem like the best way to get a dataframe back, but nothing else immediately comes to mind. Afterwards you'd still need to use sort_values to order the values column.

Santiago · Answer

If the reason you want to use groupby is to preserve the index structure then you can do the following:

df = {'key': ['A', 'B', 'A', 'B'], 'value': [2,2,1,1]}
df = pd.DataFrame(df)
print(df) 

key  value
0   A      2
1   B      2
2   A      1
3   B      1

So, first create the index:

df.set_index(['key'], inplace=True)
print(df)

     value
key       
A        2
B        2
A        1
B        1

Then, sort the index:

df.sort_index(inplace=True)
print(df)

     value
key       
A        2
A        1
B        2
B        1

Then, sort the values:

df.sort_values('value',inplace=True)
print(df)

     value
key       
A        1
B        1
A        2
B        2

And if you want to preserve the original index, finally do:

df.reset_index(inplace=True)
print(df)

  key  value
0   A      1
1   B      1
2   A      2
3   B      2

Pandas - Groupby dataframe store as dataframe without aggregating

Tags:

python

group-by

Fred

2 Answers

root

Santiago

Recent Activity

Donate For Us