Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas groupby 0 value if does not exist

I have a code like this

frame[frame['value_text'].str.match('Type 2')  | frame['value_text'].str.match('Type II diabetes')].groupby(['value_text','gender'])['value_text'].count()

which returns a series like

value_text            gender      count
type 2                  M           4
type 2 without...       M           4
                        F           3

what I want is

 value_text               gender      count
    type 2                  M           4
                            F           0
    type 2 without...       M           4
                            F           3

I want to include count for all genders even though there is no record in the dataframe. how can I do this?

like image 676
Bejita Avatar asked Jan 28 '26 00:01

Bejita


2 Answers

Categorical Data was introduced in pandas specifically for this purpose.

In effect, groupby operations with categorical data automatically calculate the Cartesian product.

You should see additional benefits compared to other functional methods: lower memory usage and data validation.

import pandas as pd

df = pd.DataFrame({'value_text': ['type2', 'type2 without', 'type2'],
                   'gender': ['M', 'F', 'M'],
                   'value': [1, 2, 3]})

df['gender'] = df['gender'].astype('category')

res = df.groupby(['value_text', 'gender']).count()\
        .fillna(0).astype(int)\
        .reset_index()

print(res)

      value_text gender  value
0          type2      F      0
1          type2      M      2
2  type2 without      F      1
3  type2 without      M      0
like image 178
jpp Avatar answered Jan 30 '26 15:01

jpp


Try appending .unstack().fillna(0).stack() to your current line, like so:

frame[frame['value_text'].str.match('Type 2')  |
      frame['value_text'].str.match('Type II diabetes')]\
.groupby(['value_text','gender'])['value_text'].count()\
.unstack().fillna(0).stack()
like image 30
Peter Leimbigler Avatar answered Jan 30 '26 14:01

Peter Leimbigler