Count unique combinations regardless of column order

Question

I have two columns of categories, with the same possible options showing up in both columns, and I'm looking to count the number of rows per unique combination, regardless of order of the categories in each row (column A or column B).

Example dataframe:

df1 = pd.DataFrame({'A':['dog','cat','mouse','bunny','cat','mouse','dog'],
                   'B':['cat','dog','dog','mouse','bunny','bunny','cat']})

Which would look like:

----------------------------
index         A        B
0           dog       cat
1           cat       dog
2           mouse     dog
3           bunny     mouse
4           cat       bunny
5           mouse     bunny
6           dog       cat
-----------------------------

And I'd like to result to look like:

--------------------------------------
index         A        B        count
0           dog       cat       3
1           bunny     mouse     2
2           mouse     dog       1
3           cat       bunny     1
--------------------------------------

I can get counts of combinations where dog/cat is counted separately from cat/dog but I'm struggling with how to get counts regardless of which category is first.

Andrej Kesely · Accepted Answer

Another solution, using .groupby:

x = (
    df1.groupby(df1.apply(lambda x: tuple(sorted(x)), axis=1))
    .agg(A=("A", "first"), B=("B", "first"), count=("B", "size"))
    .reset_index(drop=True)
)
print(x)

Prints:

       A      B  count
0    cat  bunny      1
1  bunny  mouse      2
2    dog    cat      3
3  mouse    dog      1

Count unique combinations regardless of column order

Tags:

python

pandas

count

combinations

Mark

1 Answers

Andrej Kesely

Recent Activity

Donate For Us

Count unique combinations regardless of column order

Tags:

python

pandas

count

combinations

Mark

1 Answers

Andrej Kesely

Related questions

Recent Activity

Donate For Us