I have a code like this
frame[frame['value_text'].str.match('Type 2') | frame['value_text'].str.match('Type II diabetes')].groupby(['value_text','gender'])['value_text'].count()
which returns a series like
value_text gender count
type 2 M 4
type 2 without... M 4
F 3
what I want is
value_text gender count
type 2 M 4
F 0
type 2 without... M 4
F 3
I want to include count for all genders even though there is no record in the dataframe. how can I do this?
Categorical Data was introduced in pandas specifically for this purpose.
In effect, groupby operations with categorical data automatically calculate the Cartesian product.
You should see additional benefits compared to other functional methods: lower memory usage and data validation.
import pandas as pd
df = pd.DataFrame({'value_text': ['type2', 'type2 without', 'type2'],
'gender': ['M', 'F', 'M'],
'value': [1, 2, 3]})
df['gender'] = df['gender'].astype('category')
res = df.groupby(['value_text', 'gender']).count()\
.fillna(0).astype(int)\
.reset_index()
print(res)
value_text gender value
0 type2 F 0
1 type2 M 2
2 type2 without F 1
3 type2 without M 0
Try appending .unstack().fillna(0).stack() to your current line, like so:
frame[frame['value_text'].str.match('Type 2') |
frame['value_text'].str.match('Type II diabetes')]\
.groupby(['value_text','gender'])['value_text'].count()\
.unstack().fillna(0).stack()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With