Lets take a dataframe of one column with random values. I want to get the rank of all these values which is easy by doing:
df.rank()
But if there are duplicated values you will get a duplicated value also for the rank. For example, for a given list of numbers:
[127.0, 131.856, 132.88, 126.249, 128.417, 124.336, 131.856, 130.624, 147.906, 134.412, 130.735, 133.433, nan, 125.59, 130.211, 133.847, 137.431, 130.0, 127.4, 132.226, 138.134]
the output of the rank function will be:
[4.0, 11.5, 14.0, 3.0, 6.0, 1.0, 11.5, 8.0, 20.0, 17.0, 9.0, 15.0, nan, 2.0, 7.0, 16.0, 18.0, 10.0, 5.0, 13.0, 19.0]
As you can see, the position 1 and 6 are the same and there is no 11 or 12 in the full list. How can we get a rank for these numbers even if it's arbitrary which one goes first?
Use the method
parameter in rank
, for example:
pd.Series(l).rank(method='first')
0 4.0
1 11.0
2 14.0
3 3.0
4 6.0
5 1.0
6 12.0
7 9.0
8 20.0
9 17.0
10 10.0
11 15.0
12 2.0
13 8.0
14 16.0
15 18.0
16 7.0
17 5.0
18 13.0
19 19.0
dtype: float64
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With