Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Check uniqueness for a specific value in each column

Suppose I have a pandas.DataFrame df similar to this:

   A0  A1   A2
0   a   a    b
1   b   b    g 
2   c   b    h 
3   d   c  NaN

Now there are specific values that I want to check against that DataFrame. Let's call them

candidates = ["a", "b", "c", "g"]

For each candidate I want to check if it is unique to each column of my DataFrame (It may occur in multiple columns). Desired output for this set of candidates would be a DataFrame with

pd.DataFrame(
    [
        [
            cand,
            pd.magic(cand)
        ] for cand in candidates
    ],
    columns=["cand", "unique"]
)

>   cand  unique
  0    a    True
  1    b   False
  2    c    True
  3    g    True

Even better was if instead of True it returned the number of matches (i.e [2, False, 2, 1]).

I think I'll have to use pd.DataFrame.apply(), however I can't figure out how to check only for the candidates or how to bring the result for each column back together. Maybe something like df.apply(pd.value_counts).T[cand] <= 1 is a good starting point which delivers a pd.Series with True or False for each column.

like image 792
YPOC Avatar asked Dec 01 '25 10:12

YPOC


1 Answers

Let's use DataFrame.eq to create a boolean mask corresponding to each of the candidates then use sum to get total count of occurrences in each column, finally use .lt + .all to check if its unique in all columns:

pd.DataFrame([{'cand': c, 'unique': df.eq(c).sum().lt(2).all()} for c in cand])

  cand  unique
0    a    True
1    b   False
2    c    True
3    g    True
like image 165
Shubham Sharma Avatar answered Dec 04 '25 01:12

Shubham Sharma



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!