Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How would I count all unique values of a dataframe in python without double counting?

Let's suppose I have a python data frame that looks something like this:

Factor_1    Factor_2    Factor_3   Factor_4   Factor_5
   A           B           A          Nan       Nan
   B           D           F          A         Nan
   F           A           D          B          A

Something like this in which I have 5 columns that have different factors. I would like to create a column that counts how many of this factors appear in the dtaframe but without double counting in what terms without double counting if the value apperas in one row it only counts it as 1 for example if one row has A, B, C, A, A only 1 A would be counted. The expected out output would be this.

Factor   Count
  A        3
  B        3
  D        2
  F        2
 Nan       2

I used a a code I was helped with

df.stack(dropna=False).value_counts(dropna=False)

I was using an if to drop the double count but I would like to know if there is a practical and simple way to do this, like the code above, and not with an If because what I am doing is not efficient.

like image 827
Pandas INC Avatar asked Jan 31 '26 03:01

Pandas INC


1 Answers

You can use Series.unique + Series.value_counts:

s = pd.Series(np.hstack(df.T.apply(pd.Series.unique))).value_counts(dropna=False)

B      3
A      3
F      2
D      2
NaN    2
dtype: int64
like image 114
Shubham Sharma Avatar answered Feb 02 '26 16:02

Shubham Sharma