Im writing some code in R whereby i need to generate a data set from specific criteria given they meet certain conditions.
I have three probabilities A - 0.423, B - 0.324 and C- 0.253.
I want to run a random generated sample runif(50, 0, 1).
If the number generated lies between 0 and 0.423, i want to generate a value from rnorm(50, 25, 4),
if its between 0.423 and 0.747, i want to generate a value from rnorm(50, 28, 4.5)
and finally if its between 0.747 and 1, i want to generate a value from rnorm(50, 30, 5).
I was trying to do this using some sort of compound ifelse function but to no avail.
Any suggestions?
Cheers
As you suspected, this can be done using ifelse:
u = runif(50, 0, 1)
values = ifelse(u < .423, rnorm(50, 25, 4),
ifelse(u < .747, rnorm(50, 28, 4.5),
rnorm(50, 30, 5)))
This works because ifelse can operate along vectors.
You might prefer this method, especially if you end up having more than three sections to break it into:
means = c(25, 28, 30)
vars = c(4, 4.5, 5)
probs = c(.423, .324, .253)
samples = sample(1:3, 50, replace=TRUE, prob=probs)
values = rnorm(50, means[samples], vars[samples])
You can use a combination of switch, findInterval and either do.call or mapply.
Which you use may depend on how you want the output sorted
A <- 0.423
B <- 0.324
C <- 0.253
interval <- c(0, cumsum(c(A, B, C)))
random <- runif(50, 0, 1)
rnorm_vals <- function(interval) {
switch(interval, `1` = list(mean = 25, sd = 4), `2` = list(mean = 25, sd = 4.5),
`3` = list(mean = 30, sd = 5))
}
intervals <- findInterval(random, interval, rightmost.closed = T, all.inside = T)
how_many <- table(intervals)
rnorm_calls <- lapply(names(how_many), function(nm) {
do.call(rnorm, c(n = how_many[[nm]], rnorm_vals(nm)))
})
rnorm_calls
## [[1]]
## [1] 28.38 18.54 20.52 25.79 27.50 36.86 23.40 20.82 23.75 30.64 27.62
## [12] 20.23 24.21 22.21 20.16 21.73 29.96 27.28 25.68
##
## [[2]]
## [1] 24.91 30.90 30.35 26.08 18.74 27.33 23.29 33.80 30.37 27.49 36.78
## [12] 23.03 24.66 21.46 22.62 33.32 16.16
##
## [[3]]
## [1] 30.62 29.50 31.19 25.07 24.68 33.25 40.04 33.42 30.81 21.48 32.79
## [12] 25.56 30.98 30.16
##
# or (will be sorted by )
unlist(rnorm_calls)
## [1] 28.38 18.54 20.52 25.79 27.50 36.86 23.40 20.82 23.75 30.64 27.62
## [12] 20.23 24.21 22.21 20.16 21.73 29.96 27.28 25.68 24.91 30.90 30.35
## [23] 26.08 18.74 27.33 23.29 33.80 30.37 27.49 36.78 23.03 24.66 21.46
## [34] 22.62 33.32 16.16 30.62 29.50 31.19 25.07 24.68 33.25 40.04 33.42
## [45] 30.81 21.48 32.79 25.56 30.98 30.16
or you could sweep through and get single calls to rnorm
rnorm_details <- lapply(intervals, rnorm_vals)
means <- sapply(rnorm_details, `[[`, "mean")
sds <- sapply(rnorm_details, `[[`, "sd")
mapply(rnorm, n = 1, sd = sds, mean = means)
## [1] 24.70 24.33 36.22 21.87 25.40 18.39 23.54 26.90 25.81 17.35 19.45
## [12] 18.53 33.88 27.51 21.70 25.06 24.17 35.08 28.24 21.88 29.90 28.18
## [23] 29.08 24.54 30.43 29.27 28.79 22.09 20.77 18.82 21.73 22.99 26.69
## [34] 27.73 26.66 25.32 29.95 30.77 34.89 28.55 22.95 32.45 36.11 29.72
## [45] 39.23 22.39 26.78 23.36 18.06 33.39
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With