I simulated numbers from a Normal Distribution centered around 50 and then kept numbers only between 0 and 100:
mean <- 50
sd <- 15
n <- 100000
set.seed(123)
numbers <- rnorm(n, mean, sd)
numbers <- pmax(0, pmin(numbers, 100))
As expected, the results look like a bell curve:
Then, I wanted to see what percentage of numbers are between each unit interval:
min_interval <- seq(0, 99, by = 1)
max_interval <- seq(1, 100, by = 1)
count <- sapply(min_interval, function(x) {
sum(numbers >= x & numbers < (x + 1))
})
percentage <- count / length(numbers) * 100
df <- data.frame(min_interval = min_interval,
max_interval = max_interval,
count = count,
percentage = percentage)
The results look something like this:
library(dplyr)
df %>% arrange(desc(percentage))
min_interval max_interval count percentage
1 50 51 2714 2.714
2 48 49 2632 2.632
3 49 50 2628 2.628
4 52 53 2626 2.626
5 47 48 2615 2.615
#> sum(df$percentage)
#[1] 99.957
My Question: Suppose I wanted to make the following changes to this results table (df) such that:
How can I re-calculate the percentage column such that these conditions are met? Can a general function be written for this problem? (e.g. enter which constraints you want, and an optimization algorithm finds the corresponding bell curve)
Related Question
It's great that you're using simulations to learn statistics, that's an enjoyable way to learn.
the function pnorm
tells you the proportion of a normal distribution (also sometimes called a "bell curve") that is below a certain value. You can find the proportion of a normal distribution that will be between two points by differencing their value in pnorm
. For example, with your parameters, the expected proportion between 49 and 50 is
pnorm(50,mean=50, sd=15) - pnorm(49, mean=50, sd=15)
# 0.02657646
or about 2.657%, no so far off your simulated 2.628%.
To find a normal distribution with 5% in this range, you can make a function that is the proportion in a range
f1 <- function(x) {
pnorm(50,mean=50,sd=x) - pnorm(49, mean=50, sd=x)
}
plot(x <- seq(0,10,len=100),f1(x), type="l")
# horizontal line at 0.05, or 5%
abline(0.05,0)
so you can no pick off the plot where this property holds.
If you want to solve for that, use optimize
to find the spot where this crossing happens. The crossing is when f1(x) - 0.05
is zero, so define f2
as
f2 <- function(x) {
abs(f1(x) - 0.05)
}
The absolute value allows us to minimize the function at the unique solution
optimize(lower=5,upper=10,f=f2)
# $minimum
# [1] 7.957901
#
# $objective
# [1] 3.038535e-08
this shows that if you set the standard deviation to 7.956, you should get about 5% of the distribution between 49 and 50.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With