I'm given arrays of numbers between 1 and 4, but usually they don't differ more than .5 between the min and max. The difference between each element is no smaller than .1. I want to find the smallest margin that contains at least 90% (or some other specified rate) of the elements.
That is, given the array
c(1, 1.9, 2, 2, 2, 2, 2.1, 2.2, 2.3, 2.3)
I want my function to return .4 because 2.3 - 1.9 = .4 < 2.3 - 1 = 1.3. Details:
I tried to build the function a few times, but it keeps growing overly complicated, and I'm wondering if there's a simple way to do this that I haven't considered.
Edit: it has to be able to satisfy skewed distributions. I don't have any completed examples of code I produced since I keep reconstructing it, but I'll make something and post it.
Edit2: I can't provide any examples of the arrays I want to feed into function, but Here's a function for generating similar values. It's not important that it doesn't fall in the 1 to 4 range as long as it works.
x = round(rbeta(20,5,2)*100)/10
The easiest way will be to brute force by testing all possible ranges that include 90%. To do this, we figure out how many terms that is, and what indices the ranges therefore can start at, and compute the difference for each, and then the minimum of those.
x <- c(1, 1.9, 2, 2, 2, 2, 2.1, 2.2, 2.3, 2.3)
n <- ceiling(length(x)*0.9) # get the number of terms needed to include 90%
k <- 1 : (length(x) - n + 1) # get the possible indices the range can start at
x <- sort(x) # need them sorted...
d <- x[k + n - 1] - x[k] # get the difference starting at each range
min(d) # get the smallest difference
Here's one way (same as @Aaron's except head/tail instead of x[i]):
x = c(1, 1.9, 2, 2, 2, 2, 2.1, 2.2, 2.3, 2.3)
xn= length(x)
# number of elements to drop
n = round(0.1*xn)
# achievable ranges
v = tail(x, n+1) - head(x, n+1)
min(v)
# [1] 0.4
Confirmation that a subvector of x dropping n elements really has this range:
n_up = which.min(v) - 1
n_dn = n-n_up
xs = x[(1 + n_up):(xn - n_dn)]
diff(range(xs))
# [1] 0.4
length(x) - length(xs) == n
# [1] TRUE
Testing on new example:
set.seed(1)
x0 = round(rbeta(20,5,2)*100)/10
x = sort(x0)
xn= length(x)
n = round(0.1*xn)
v = tail(x, n+1) - head(x, n+1)
min(v)
# [1] 4.1
# confirm...
n_up = which.min(v) - 1
n_dn = n-n_up
xs = x[(1 + n_up):(xn - n_dn)]
diff(range(xs))
# [1] 4.1
length(x) - length(xs) == n
# [1] TRUE
Partial sorting might be sufficient (just to get the top and bottom values on the ends); see ?sort.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With