Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas: what is the data-type of object passed to the agg function

I have been curious about what exactly is passed to the agg function

Id      NAME   SUB_ID
276956  A      5933
276956  B      5934
276956  C      5935
287266  D      1589

So when I call an agg function, what exactly is the datatype of x.

df.groupby('Id').agg(lambda x: set(x))

From my own digging up, I find x to be <type 'property'> but I dont understand what exactly it is. What I am trying to do is compress the records into one row for any particular group. So for id 276956 , I want to have A,B,C in one cell under the Name column. I have been doing it by converting it into a set but its causing me some grief with Nan and None values. I was wondering whats the best way to compress in a single row. If these are numpy arrays then I don't really need to convert but something like

df.groupby('Id').agg(lambda x: x)

throws an error

like image 312
Fizi Avatar asked Dec 28 '25 00:12

Fizi


2 Answers

You working with Series:

print (df.groupby('Id').agg(lambda x: print(x)))
0    A
1    B
2    C
Name: NAME, dtype: object
3    D
Name: NAME, dtype: object
0    5933
1    5934
2    5935
Name: SUB_ID, dtype: int64
3    1589
Name: SUB_ID, dtype: int64

You can working with custom function, but output has to be aggregated:

def f(x):
    print (x)
    return set(x)

print (df.groupby('Id').agg(f))
             NAME              SUB_ID
Id                                   
276956  {C, B, A}  {5933, 5934, 5935}
287266        {D}              {1589}     

If need aggreagate join, numeric columns are omited:

print (df.groupby('Id').agg(', '.join))
           NAME
Id             
276956  A, B, C
287266        D

If mean, string columns are omited:

print (df.groupby('Id').mean())
        SUB_ID
Id            
276956    5934
287266    1589

More common is used function apply - see flexible apply:

def f(x):
    print (x)
    return ', '.join(x)

print (df.groupby('Id')['NAME'].apply(f))
Id
276956    A, B, C
287266          D
Name: NAME, dtype: object
like image 71
jezrael Avatar answered Dec 30 '25 13:12

jezrael


>>> df[['Id', 'NAME']].groupby('Id').agg(lambda x: ', '.join(x))
           NAME
Id             
276956  A, B, C
287266        D

The x in this case will be the series for each relevant grouping on Id.

To actually get a list of values:

>>> df[['Id', 'NAME']].groupby('Id').agg(lambda x: x.values.tolist())
             NAME
Id               
276956  [A, B, C]
287266        [D]

More generally, x will be a dataframe for the relevant grouping and you can perform any action on it that you could normally do with a dataframe, e.g.

>>> df.groupby('Id').agg(lambda x: x.shape)
        NAME SUB_ID
Id                 
276956  (3,)   (3,)
287266  (1,)   (1,)
like image 24
Alexander Avatar answered Dec 30 '25 12:12

Alexander