I am confused as to what I am doing incorrectly.
I have the following code:
import numpy as np
from scipy import stats
df
Out[29]: array([66., 69., 67., 75., 69., 69.])
val = 73.94
z1 = stats.percentileofscore(df, val)
print(z1)
Out[33]: 83.33333333333334
np.percentile(df, z1)
Out[34]: 69.999999999
I was expecting that np.percentile(df, z1)
would give me back val = 73.94
I think you're not quite understanding what percentileofscore
and percentile
actually do. They are not inverses of each other.
From the docs for scipy.stats.percentileofscore
:
The percentile rank of a score relative to a list of scores.
A
percentileofscore
of, for example, 80% means that 80% of the scores in a are below the given score. In the case of gaps or ties, the exact definition depends on the optional keyword, kind.
So when you supply the value 73.94
, there are 5
elements of df
that fall below that score, and 5/6
gives you your 83.3333%
result.
Now in the Notes for numpy.percentile
:
Given a vector V of length N, the q-th percentile of V is the value q/100 of the way from the minimum to the maximum in a sorted copy of V.
The default interpolation
parameter is 'linear'
so:
'linear':
i + (j - i) * fraction
, where fraction is the fractional part of the index surrounded by i and j.
Since you have provided 83
as your input parameter, you're looking at a value 83/100
of the way from minimum to the maximum in your array.
If you're interested in digging through the source, you can find it here, but here is a simplified look at the calculation being done here:
ap = np.asarray(sorted(df))
Nx = df.shape[0]
indices = z1 / 100 * (Nx - 1)
indices_below = np.floor(indices).astype(int)
indices_above = indices_below + 1
weight_above = indices - indices_below
weight_below = 1 - weight_above
x1 = ap[b] * weight_below # 57.50000000000004
x2 = ap[a] * weight_above # 12.499999999999956
x1 + x2
70.0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With