Is there a function that tells the percentage or number of matches in a pandas DataFrame without doing something like this...
len(trace_df[trace_df['ratio'] > 0]) / len(trace_df)
0.189
len(trace_df[trace_df['ratio'] <= 0]) / len(trace_df)
0.811
There must be a more Pythonic or at least elegant way of doing this.
The most pythonic way of finding a percentage of a column that is true is to simply take the mean of the boolean expression.
(trace_df['ratio'] > 0).mean()
Ted's answer is good, of course, just consider this response as an elaboration on that. If there are missing values, as there often are, note that they will also be treated as False because pandas only tracks missing values for floats, and not for booleans.
ser = pd.Series([-1,1,np.nan])
(ser > 0).mean()
0.33333333333333331
And similarly, the good point made by Jezrael is only true for Ted's answer if there are no missing values. (In this case you will have .333 + .333 != 1)
That's not necessarily wrong (and it's the same as what your answer produces), but if you have missing values, you may prefer adding some additional code to Ted's answer:
(ser[ser.notnull()] > 0).mean()
0.5
I hope this doesn't come across as a nit, but I think it's worth noting here because the default behavior of mean()
is to exclude missing values, but when you take the mean of a boolean like this you are effectively including missing values, possibly leading to unexpected results.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With