I have been curious about what exactly is passed to the agg function
Id NAME SUB_ID
276956 A 5933
276956 B 5934
276956 C 5935
287266 D 1589
So when I call an agg function, what exactly is the datatype of x.
df.groupby('Id').agg(lambda x: set(x))
From my own digging up, I find x to be <type 'property'> but I dont understand what exactly it is. What I am trying to do is compress the records into one row for any particular group. So for id 276956 , I want to have A,B,C in one cell under the Name column. I have been doing it by converting it into a set but its causing me some grief with Nan and None values. I was wondering whats the best way to compress in a single row. If these are numpy arrays then I don't really need to convert but something like
df.groupby('Id').agg(lambda x: x)
throws an error
You working with Series:
print (df.groupby('Id').agg(lambda x: print(x)))
0 A
1 B
2 C
Name: NAME, dtype: object
3 D
Name: NAME, dtype: object
0 5933
1 5934
2 5935
Name: SUB_ID, dtype: int64
3 1589
Name: SUB_ID, dtype: int64
You can working with custom function, but output has to be aggregated:
def f(x):
print (x)
return set(x)
print (df.groupby('Id').agg(f))
NAME SUB_ID
Id
276956 {C, B, A} {5933, 5934, 5935}
287266 {D} {1589}
If need aggreagate join, numeric columns are omited:
print (df.groupby('Id').agg(', '.join))
NAME
Id
276956 A, B, C
287266 D
If mean, string columns are omited:
print (df.groupby('Id').mean())
SUB_ID
Id
276956 5934
287266 1589
More common is used function apply - see flexible apply:
def f(x):
print (x)
return ', '.join(x)
print (df.groupby('Id')['NAME'].apply(f))
Id
276956 A, B, C
287266 D
Name: NAME, dtype: object
>>> df[['Id', 'NAME']].groupby('Id').agg(lambda x: ', '.join(x))
NAME
Id
276956 A, B, C
287266 D
The x in this case will be the series for each relevant grouping on Id.
To actually get a list of values:
>>> df[['Id', 'NAME']].groupby('Id').agg(lambda x: x.values.tolist())
NAME
Id
276956 [A, B, C]
287266 [D]
More generally, x will be a dataframe for the relevant grouping and you can perform any action on it that you could normally do with a dataframe, e.g.
>>> df.groupby('Id').agg(lambda x: x.shape)
NAME SUB_ID
Id
276956 (3,) (3,)
287266 (1,) (1,)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With