Percentage match in pandas Dataframe

Question

Is there a function that tells the percentage or number of matches in a pandas DataFrame without doing something like this...

len(trace_df[trace_df['ratio'] > 0]) / len(trace_df)
0.189

len(trace_df[trace_df['ratio'] <= 0]) / len(trace_df)
0.811

There must be a more Pythonic or at least elegant way of doing this.

Ted Petrou · Accepted Answer

The most pythonic way of finding a percentage of a column that is true is to simply take the mean of the boolean expression.

(trace_df['ratio'] > 0).mean()

JohnE · Answer

Ted's answer is good, of course, just consider this response as an elaboration on that. If there are missing values, as there often are, note that they will also be treated as False because pandas only tracks missing values for floats, and not for booleans.

ser = pd.Series([-1,1,np.nan])
(ser > 0).mean()
0.33333333333333331

And similarly, the good point made by Jezrael is only true for Ted's answer if there are no missing values. (In this case you will have .333 + .333 != 1)

That's not necessarily wrong (and it's the same as what your answer produces), but if you have missing values, you may prefer adding some additional code to Ted's answer:

(ser[ser.notnull()] > 0).mean()
0.5

I hope this doesn't come across as a nit, but I think it's worth noting here because the default behavior of mean() is to exclude missing values, but when you take the mean of a boolean like this you are effectively including missing values, possibly leading to unexpected results.

Percentage match in pandas Dataframe

Tags:

python

pandas

pymc3

SARose

2 Answers

Ted Petrou

JohnE

Recent Activity

Donate For Us

Percentage match in pandas Dataframe

Tags:

python

pandas

pymc3

SARose

2 Answers

Ted Petrou

JohnE

Related questions

Recent Activity

Donate For Us