I need to group a dataframe, but I need to create two columns, one that is a simple count and another that is a count with conditional, as in the example:
The qtd_ok
column counts only those that have 'OK'
I tried this, but I do not know how to add the total count in the same groupby
:
df.groupby(['column1', 'column2', 'column3']).apply(lambda x : x['status'].sum() == 'OK')
First create helper column A
with assign
and then aggregate by agg
functions sum
for count only OK
values and size
for count all values per groups:
df = (df.assign(A=(df['status']== 'OK'))
.groupby(['column1', 'column2', 'column3'])['A']
.agg([('qtd_ok','sum'),('qtd','size')])
.astype(int)
.reset_index())
Sample:
df = pd.DataFrame({
'column1':['a'] * 9,
'column2':['a'] * 4 + ['b'] * 5,
'column3':list('aaabaabbb'),
'status':list('aabaaabba'),
})
print (df)
column1 column2 column3 status
0 a a a a
1 a a a a
2 a a a b
3 a a b a
4 a b a a
5 a b a a
6 a b b b
7 a b b b
8 a b b a
df = (df.assign(A=(df['status']== 'a'))
.groupby(['column1', 'column2', 'column3'])['A']
.agg([('qtd_ok','sum'),('qtd','size')])
.astype(int)
.reset_index())
print (df)
column1 column2 column3 qtd_ok qtd
0 a a a 2 3
1 a a b 1 1
2 a b a 2 2
3 a b b 1 3
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With