I have two columns of categories, with the same possible options showing up in both columns, and I'm looking to count the number of rows per unique combination, regardless of order of the categories in each row (column A or column B).
Example dataframe:
df1 = pd.DataFrame({'A':['dog','cat','mouse','bunny','cat','mouse','dog'],
'B':['cat','dog','dog','mouse','bunny','bunny','cat']})
Which would look like:
----------------------------
index A B
0 dog cat
1 cat dog
2 mouse dog
3 bunny mouse
4 cat bunny
5 mouse bunny
6 dog cat
-----------------------------
And I'd like to result to look like:
--------------------------------------
index A B count
0 dog cat 3
1 bunny mouse 2
2 mouse dog 1
3 cat bunny 1
--------------------------------------
I can get counts of combinations where dog/cat is counted separately from cat/dog but I'm struggling with how to get counts regardless of which category is first.
Another solution, using .groupby
:
x = (
df1.groupby(df1.apply(lambda x: tuple(sorted(x)), axis=1))
.agg(A=("A", "first"), B=("B", "first"), count=("B", "size"))
.reset_index(drop=True)
)
print(x)
Prints:
A B count
0 cat bunny 1
1 bunny mouse 2
2 dog cat 3
3 mouse dog 1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With