Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Percentage match in pandas Dataframe

Is there a function that tells the percentage or number of matches in a pandas DataFrame without doing something like this...

len(trace_df[trace_df['ratio'] > 0]) / len(trace_df)
0.189

len(trace_df[trace_df['ratio'] <= 0]) / len(trace_df)
0.811

There must be a more Pythonic or at least elegant way of doing this.

like image 367
SARose Avatar asked Oct 19 '25 12:10

SARose


2 Answers

The most pythonic way of finding a percentage of a column that is true is to simply take the mean of the boolean expression.

(trace_df['ratio'] > 0).mean()
like image 153
Ted Petrou Avatar answered Oct 21 '25 10:10

Ted Petrou


Ted's answer is good, of course, just consider this response as an elaboration on that. If there are missing values, as there often are, note that they will also be treated as False because pandas only tracks missing values for floats, and not for booleans.

ser = pd.Series([-1,1,np.nan])
(ser > 0).mean()
0.33333333333333331

And similarly, the good point made by Jezrael is only true for Ted's answer if there are no missing values. (In this case you will have .333 + .333 != 1)

That's not necessarily wrong (and it's the same as what your answer produces), but if you have missing values, you may prefer adding some additional code to Ted's answer:

(ser[ser.notnull()] > 0).mean()
0.5

I hope this doesn't come across as a nit, but I think it's worth noting here because the default behavior of mean() is to exclude missing values, but when you take the mean of a boolean like this you are effectively including missing values, possibly leading to unexpected results.

like image 24
JohnE Avatar answered Oct 21 '25 10:10

JohnE



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!