Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reverse Engineering a Bell Curve

Tags:

r

I simulated numbers from a Normal Distribution centered around 50 and then kept numbers only between 0 and 100:

mean <- 50
sd <- 15
n <- 100000

set.seed(123)  
numbers <- rnorm(n, mean, sd)

numbers <- pmax(0, pmin(numbers, 100))

As expected, the results look like a bell curve:

enter image description here

Then, I wanted to see what percentage of numbers are between each unit interval:

min_interval <- seq(0, 99, by = 1)
max_interval <- seq(1, 100, by = 1)


count <- sapply(min_interval, function(x) {
    sum(numbers >= x & numbers < (x + 1))
})
percentage <- count / length(numbers) * 100

df <- data.frame(min_interval = min_interval,
                 max_interval = max_interval,
                 count = count,
                 percentage = percentage)

The results look something like this:

library(dplyr)
df %>% arrange(desc(percentage))

    min_interval max_interval count percentage
1             50           51  2714      2.714
2             48           49  2632      2.632
3             49           50  2628      2.628
4             52           53  2626      2.626
5             47           48  2615      2.615

#> sum(df$percentage)
#[1] 99.957

My Question: Suppose I wanted to make the following changes to this results table (df) such that:

  • (min_interval = 50, max_interval = 51) has a percentage of 5%
  • (min_interval = 49, max_interval = 50) has a percentage of 5%
  • The percentage columns still sums to 1
  • The plot still keeps its approximate shape

How can I re-calculate the percentage column such that these conditions are met? Can a general function be written for this problem? (e.g. enter which constraints you want, and an optimization algorithm finds the corresponding bell curve)

Related Question

  • https://math.stackexchange.com/questions/4804927/calculating-the-probabilities-of-a-100-sided-weighted-dice
like image 941
stats_noob Avatar asked Sep 12 '25 03:09

stats_noob


1 Answers

It's great that you're using simulations to learn statistics, that's an enjoyable way to learn.

the function pnorm tells you the proportion of a normal distribution (also sometimes called a "bell curve") that is below a certain value. You can find the proportion of a normal distribution that will be between two points by differencing their value in pnorm. For example, with your parameters, the expected proportion between 49 and 50 is

pnorm(50,mean=50, sd=15) - pnorm(49, mean=50, sd=15)
# 0.02657646

or about 2.657%, no so far off your simulated 2.628%.

To find a normal distribution with 5% in this range, you can make a function that is the proportion in a range

f1 <- function(x) {
  pnorm(50,mean=50,sd=x) - pnorm(49, mean=50, sd=x)
}
plot(x <- seq(0,10,len=100),f1(x), type="l")
# horizontal line at 0.05, or 5%
abline(0.05,0)

so you can no pick off the plot where this property holds.

If you want to solve for that, use optimize to find the spot where this crossing happens. The crossing is when f1(x) - 0.05 is zero, so define f2 as

f2 <- function(x) {
  abs(f1(x) - 0.05)
}

The absolute value allows us to minimize the function at the unique solution

optimize(lower=5,upper=10,f=f2)
# $minimum
# [1] 7.957901
# 
# $objective
# [1] 3.038535e-08

this shows that if you set the standard deviation to 7.956, you should get about 5% of the distribution between 49 and 50.

like image 198
pdb Avatar answered Sep 14 '25 16:09

pdb