Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas Dataframe groupby, sort groups by absolute value

Hi everyone I basically want to find an efficient way to sort the grouped data by the absolute value.

For example:

item       itemID value
cars         A     5
             B    -3
             C     2
             D    -4
             E     1
houses       A    -2
             B     4
             C    -6
             D     3
             E     7

Should be:

item        itemID value
 car          A      5
              D     -4             
              B     -3
              C      2
              E      1
houses        E      7
              C     -6
              B      4
              D      3
              A     -2

Here is the dataframe and groupby for reference:

data = {'item':['car','car','car','car','car','houses','houses','houses','houses','houses'], 'itemID':['A','B','C','D','E','A','B','C','D','E'],'value':[5,-3,2,-4,1,-2,4,-6,3,7]}
df = pd.DataFrame(data)
gdf = df.groupby('item')

I've tried this:

gdf.apply(lambda g: g.reindex(g[['value']].abs().sort('value', ascending=True).index))

and it works fine most of the times but some times it gives me the error

ValueError: Shape of passed values is (100,10), indices imply (105, 10)

I don't really get this error in the provided data set but I use it in large and different datasets which I can't provide here and get it in some of them but I'm sure the data have nothing to do with it since they are all very similar.

I've done some debugging and everytime I get this error is when apply duplicates the first group.

So are there any better ways to do it without using apply?

Note: I tried transform but it gets rid of the groups and outputs a different dataset which is definitely not what I want, I want to keep the groups and the format. Maybe I'm using it wrong?

like image 902
Blade Cutts Avatar asked Oct 29 '25 14:10

Blade Cutts


2 Answers

Consider simply creating an absolute value column through a defined function, apply the function on a groupby, and then sorting item ascending and absolute value descending. Finally, filter out the newly created, unneeded column:

# CREATE ABS VALUE FUNCTION TO CREATE COLUMN
def valsort(row):    
    row['absvalue'] = row['value'].abs()
    return row

# APPLY FUNCTION AND RESET DATA FRAME
gdf = df.groupby(['item', 'itemID']).apply(valsort).sort(['item', 'absvalue'], 
                                                    ascending=[1,0]).reset_index()

# FILTER OUT ABS VALUE
gdf = gdf[['item', 'itemID', 'value']]

print(gdf)

OUTPUT

     item itemID  value
0     car      A      5
1     car      D     -4
2     car      B     -3
3     car      C      2
4     car      E      1
5  houses      E      7
6  houses      C     -6
7  houses      B      4
8  houses      D      3
9  houses      A     -2
like image 140
Parfait Avatar answered Oct 31 '25 04:10

Parfait


In [48]:
df['value'] = df.groupby(df.index)['value'].apply(lambda x : x[np.argsort(np.abs(x))][::-1])
df
Out[48]:
     itemID value
item        
cars    A   5
cars    B   -4
cars    C   -3
cars    D   2
cars    E   1
houses  A   7
houses  B   -6
houses  C   4
houses  D   3
houses  E   -2
like image 20
Nader Hisham Avatar answered Oct 31 '25 04:10

Nader Hisham



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!