What is the easiest way to find the smallest interval that contains 90% of the values in an array using R?

Question

I'm given arrays of numbers between 1 and 4, but usually they don't differ more than .5 between the min and max. The difference between each element is no smaller than .1. I want to find the smallest margin that contains at least 90% (or some other specified rate) of the elements.

That is, given the array

c(1, 1.9, 2, 2, 2, 2, 2.1, 2.2, 2.3, 2.3)

I want my function to return .4 because 2.3 - 1.9 = .4 < 2.3 - 1 = 1.3. Details:

2.3 - 1.9 comes from the 90%-length subvector starting at 1.9 and running to the end
2.3 - 1 comes from the 90%-length subvector starting at 1 and ending at the first 2.3

I tried to build the function a few times, but it keeps growing overly complicated, and I'm wondering if there's a simple way to do this that I haven't considered.

Edit: it has to be able to satisfy skewed distributions. I don't have any completed examples of code I produced since I keep reconstructing it, but I'll make something and post it.

Edit2: I can't provide any examples of the arrays I want to feed into function, but Here's a function for generating similar values. It's not important that it doesn't fall in the 1 to 4 range as long as it works.

x = round(rbeta(20,5,2)*100)/10

Aaron left Stack Overflow · Accepted Answer

The easiest way will be to brute force by testing all possible ranges that include 90%. To do this, we figure out how many terms that is, and what indices the ranges therefore can start at, and compute the difference for each, and then the minimum of those.

x <- c(1, 1.9, 2, 2, 2, 2, 2.1, 2.2, 2.3, 2.3)
n <- ceiling(length(x)*0.9)   # get the number of terms needed to include 90%
k <- 1 : (length(x) - n + 1)  # get the possible indices the range can start at
x <- sort(x)                  # need them sorted...
d <- x[k + n - 1] - x[k]      # get the difference starting at each range
min(d)                        # get the smallest difference

Frank · Answer

Here's one way (same as @Aaron's except head/tail instead of x[i]):

x = c(1, 1.9, 2, 2, 2, 2, 2.1, 2.2, 2.3, 2.3)
xn= length(x)

# number of elements to drop
n = round(0.1*xn) 

# achievable ranges
v = tail(x, n+1) - head(x, n+1)

min(v)
# [1] 0.4

Confirmation that a subvector of x dropping n elements really has this range:

n_up = which.min(v) - 1
n_dn = n-n_up

xs = x[(1 + n_up):(xn - n_dn)]

diff(range(xs))
# [1] 0.4
length(x) - length(xs) == n
# [1] TRUE

Testing on new example:

set.seed(1)
x0 = round(rbeta(20,5,2)*100)/10
x = sort(x0)
xn= length(x)

n = round(0.1*xn)
v = tail(x, n+1) - head(x, n+1)

min(v)
# [1] 4.1

# confirm...
n_up = which.min(v) - 1
n_dn = n-n_up    
xs = x[(1 + n_up):(xn - n_dn)]

diff(range(xs))
# [1] 4.1
length(x) - length(xs) == n
# [1] TRUE

Partial sorting might be sufficient (just to get the top and bottom values on the ends); see ?sort.

What is the easiest way to find the smallest interval that contains 90% of the values in an array using R?

Tags:

r

chance.will

2 Answers

Aaron left Stack Overflow

Frank

Recent Activity

Donate For Us

What is the easiest way to find the smallest interval that contains 90% of the values in an array using R?

Tags:

r

chance.will

2 Answers

Aaron left Stack Overflow

Frank

Related questions

Recent Activity

Donate For Us