I am looking for a way to take an ordered vector and return the percentage of the way through the vector that each value appears for the first time.
See below for the input vector and the expected result.
InputVector<-c(1,1,1,1,1,2,2,2,3,3)
ExpectedResult<-data.frame(Value=c(1,2,3), Percentile=c(0,0.5,0.8))
In this case, 1 appears at the 0th percentile, 2 at the 50th and 3 at the 80th.
In base R, with rle
and cumsum
:
p <- with(rle(InputVector), cumsum(lengths) / sum(lengths))
c(0, p[-length(p)])
#[1] 0.0 0.5 0.8
Using rank()
and unique()
:
data.frame(
Value = InputVector,
Percentile = (rank(InputVector, ties.method = "min") - 1) / length(InputVector)
) |>
unique()
#> Value Percentile
#> 1 1 0.0
#> 6 2 0.5
#> 9 4 0.8
You could also use dplyr::percent_rank()
, but note it computes percentiles differently:
library(dplyr)
tibble(
Value = InputVector,
Percentile = percent_rank(Value)
) %>%
distinct()
#> # A tibble: 3 × 2
#> Value Percentile
#> <dbl> <dbl>
#> 1 1 0
#> 2 2 0.556
#> 3 4 0.889
Created on 2022-11-09 with reprex v2.0.2
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With