Suppose I have a pandas.DataFrame df similar to this:
A0 A1 A2
0 a a b
1 b b g
2 c b h
3 d c NaN
Now there are specific values that I want to check against that DataFrame. Let's call them
candidates = ["a", "b", "c", "g"]
For each candidate I want to check if it is unique to each column of my DataFrame (It may occur in multiple columns). Desired output for this set of candidates would be a DataFrame with
pd.DataFrame(
[
[
cand,
pd.magic(cand)
] for cand in candidates
],
columns=["cand", "unique"]
)
> cand unique
0 a True
1 b False
2 c True
3 g True
Even better was if instead of True it returned the number of matches (i.e [2, False, 2, 1]).
I think I'll have to use pd.DataFrame.apply(), however I can't figure out how to check only for the candidates or how to bring the result for each column back together. Maybe something like df.apply(pd.value_counts).T[cand] <= 1 is a good starting point which delivers a pd.Series with True or False for each column.
Let's use DataFrame.eq to create a boolean mask corresponding to each of the candidates then use sum to get total count of occurrences in each column, finally use .lt + .all to check if its unique in all columns:
pd.DataFrame([{'cand': c, 'unique': df.eq(c).sum().lt(2).all()} for c in cand])
cand unique
0 a True
1 b False
2 c True
3 g True
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With