Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Applying pandas groupby for each index

Tags:

python

pandas

I have a dataframe with a person's name as the index (can have multiple entries) and two columns 'X' and 'Y'. The columns 'X' and 'Y' can be any letter between A-C.

for example:

df = pd.DataFrame({'X' : ['A', 'B', 'A', 'C'], 'Y' : ['B', 'A', 'A', 'C']},index = ['Bob','Bob','John','Mike'])

For each person (i.e. index) I would like to get the number of occurrences of every unique combination of columns 'X' and 'Y' (for example - for Bob I have 1 count of ('A','B') and 1 count of ('B','A')).

When I do the following:

df.loc['Bob'].groupby(['X','Y']).size() 

I get the correct results for Bob. How can I do this for each person without al oop? Ideally, I would get a dataframe with the different people as index, every unique combination of columns 'X' and 'Y' as the columns and the number of times it appeared in the dataframe as the value.

    ('A','A') ('A','B') ('A','C') ('B','A') ... ('C','C')
Bob     0         1         0         1             0
John    1         0         0         0             0
Mike    0         0         0         0             1
like image 421
Eyal S. Avatar asked Nov 19 '25 03:11

Eyal S.


2 Answers

using get_dummies and groupby

pd.get_dummies(df.apply(tuple, 1)).groupby(level=0).sum()

      (A, A)  (A, B)  (B, A)  (C, C)
Bob        0       1       1       0
John       1       0       0       0
Mike       0       0       0       1
like image 112
piRSquared Avatar answered Nov 20 '25 16:11

piRSquared


I think you can use:

#convert columns X and Y to tuples
df['tup'] = list(zip(df.X, df.Y))

#get size and reshape
df1 = df.reset_index().groupby(['index','tup']).size().unstack(fill_value=0)
print (df1)
tup    (A, A)  (A, B)  (B, A)  (C, C)
index                                
Bob         0       1       1       0
John        1       0       0       0
Mike        0       0       0       1

#get all unique combination
from  itertools import product
comb = list(product(df.X.unique(), df.Y.unique()))
print (comb)
[('A', 'B'), ('A', 'A'), ('A', 'C'), ('B', 'B'), ('B', 'A'), 
 ('B', 'C'), ('C', 'B'), ('C', 'A'), ('C', 'C')]

#reindex columns by this combination
print (df1.reindex(columns=comb, fill_value=0))
tup    (A, B)  (A, A)  (A, C)  (B, B)  (B, A)  (B, C)  (C, B)  (C, A)  (C, C)
index                                                                        
Bob         1       0       0       0       1       0       0       0       0
John        0       1       0       0       0       0       0       0       0
Mike        0       0       0       0       0       0       0       0       1
like image 27
jezrael Avatar answered Nov 20 '25 18:11

jezrael



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!