I have a dataframe like this:
df_test = pd.DataFrame({'ID1':['A','A','A','A','A','A','B','B','B','B'],
'ID2':['a','a','a','aa','aaa','aaa','b','b','bb','bb'],
'ID3':['c1','c2','c3','c4','c5','c6','c7','c8','c9','c10'],
'condition':['','!','','!','','','!','','','!']})
df_test
I want group by them by ID1. the result dataframe need to like this (the count_condition means the number of '!')for each ID2 group:
df_test_result = pd.DataFrame({'ID1':['A','A','A','B','B'],
'ID2':['a','aa','aaa','b','bb'],
'Count_ID2':[3,1,2,2,2],
'Count_ID3':[3,1,2,2,2],
'Count_condition': [1,1,0,1,1]})
df_test_result
I have tried to use groupby and agg to obtain this result, but I can not get the number of '!' for each group. here's my command
df_test_result = df_test.groupby(['ID1','ID2']).agg({'ID2':'count','ID3':'nunique','condition':'count'})
if there's a kind of command like this wrong one:
df_test = df_test.groupby(['ID1','ID2']).agg({'ID2':'count','ID3':'nunique','condition' == '!':'count'})
You can use a named groupby
:
df_test.groupby(
['ID1','ID2']).agg(
Count_ID2=('ID2', 'count'),
Count_ID3=('ID3', 'count'),
Count_condition=("condition", lambda x: str(x).count('!')))
prints:
Count_ID2 Count_ID3 Count_condition
ID1 ID2
A a 3 3 1
aa 1 1 1
aaa 2 2 0
B b 2 2 1
bb 2 2 1
In the above we are counting the occurences with aggfunc="count"
for columns "ID2" and "ID3", and creating a small custom function which count's the occurences of !
for the "condition" column. We do the aforementioned for each group and we returned named columns for our aggregation results
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With