Applying pandas groupby for each index

Question

I have a dataframe with a person's name as the index (can have multiple entries) and two columns 'X' and 'Y'. The columns 'X' and 'Y' can be any letter between A-C.

for example:

df = pd.DataFrame({'X' : ['A', 'B', 'A', 'C'], 'Y' : ['B', 'A', 'A', 'C']},index = ['Bob','Bob','John','Mike'])

For each person (i.e. index) I would like to get the number of occurrences of every unique combination of columns 'X' and 'Y' (for example - for Bob I have 1 count of ('A','B') and 1 count of ('B','A')).

When I do the following:

df.loc['Bob'].groupby(['X','Y']).size()

I get the correct results for Bob. How can I do this for each person without al oop? Ideally, I would get a dataframe with the different people as index, every unique combination of columns 'X' and 'Y' as the columns and the number of times it appeared in the dataframe as the value.

    ('A','A') ('A','B') ('A','C') ('B','A') ... ('C','C')
Bob     0         1         0         1             0
John    1         0         0         0             0
Mike    0         0         0         0             1

piRSquared · Accepted Answer

using get_dummies and groupby

pd.get_dummies(df.apply(tuple, 1)).groupby(level=0).sum()

      (A, A)  (A, B)  (B, A)  (C, C)
Bob        0       1       1       0
John       1       0       0       0
Mike       0       0       0       1

jezrael · Answer

I think you can use:

#convert columns X and Y to tuples
df['tup'] = list(zip(df.X, df.Y))

#get size and reshape
df1 = df.reset_index().groupby(['index','tup']).size().unstack(fill_value=0)
print (df1)
tup    (A, A)  (A, B)  (B, A)  (C, C)
index                                
Bob         0       1       1       0
John        1       0       0       0
Mike        0       0       0       1

#get all unique combination
from  itertools import product
comb = list(product(df.X.unique(), df.Y.unique()))
print (comb)
[('A', 'B'), ('A', 'A'), ('A', 'C'), ('B', 'B'), ('B', 'A'), 
 ('B', 'C'), ('C', 'B'), ('C', 'A'), ('C', 'C')]

#reindex columns by this combination
print (df1.reindex(columns=comb, fill_value=0))
tup    (A, B)  (A, A)  (A, C)  (B, B)  (B, A)  (B, C)  (C, B)  (C, A)  (C, C)
index                                                                        
Bob         1       0       0       0       1       0       0       0       0
John        0       1       0       0       0       0       0       0       0
Mike        0       0       0       0       0       0       0       0       1

Applying pandas groupby for each index

Tags:

python

pandas

Eyal S.

2 Answers

piRSquared

jezrael

Recent Activity

Donate For Us

Applying pandas groupby for each index

Tags:

python

pandas

Eyal S.

2 Answers

piRSquared

jezrael

Related questions

Recent Activity

Donate For Us