I am using pandas groupby and want to apply the function to make a set from the items in the group.
The following results in TypeError: 'type' object is not iterable:
df = df.groupby('col1')['col2'].agg({'size': len, 'set': set})
But the following works:
def to_set(x):
return set(x)
df = df.groupby('col1')['col2'].agg({'size': len, 'set': to_set})
In my understanding the two expression are similar, what is the reason why the first does not work?
You can group DataFrame rows into a list by using pandas. DataFrame. groupby() function on the column of interest, select the column you want as a list from group and then use Series. apply(list) to get the list for every group.
Groupby is a very powerful pandas method. You can group by one column and count the values of another column per this column value using value_counts. Using groupby and value_counts we can count the number of activities each person did.
set, doesn't result in TypeError: 'type' object is not iterable.
It's because set is of type type whereas to_set is of type function:
type(set)
<class 'type'>
def to_set(x):
return set(x)
type(to_set)
<class 'function'>
According to the docs, .agg() expects:
arg :
functionordict
Function to use for aggregating groups.
- If a
function, must either work when passed aDataFrameor when passed toDataFrame.apply.
- If passed a
dict, the keys must beDataFramecolumn names.
Accepted Combinations are:
stringcythonized function namefunction
listof functions
dictof columns -> functions
- nested
dictof names -> dicts of functions
Try using:
df = df.groupby('col1')['col2'].agg({'size': len, 'set': lambda x: set(x)})
Works for me.
Update for newer versions of Pandas if you get the following error
SpecificationError: nested renamer is not supported
df = df.groupby('col1')['col2'].agg(size= len, set= lambda x: set(x))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With