As seen in this question, in Windows it is not possible to run parallel processes with shared memory in R. Therefore, I have devised the following methodology, using a series of set.seed(), to simulate a "shared memory" between my cores.
First, I want to simulate a random variable (RV) matrix:
Second, I want to repeat the above a certain number of times, precisely nruns times. At each run in the nruns any metric (mean, quantile) can be computed from this matrix.
nruns <- 4
nsim.runs <- c(5000, 10000)
nport.total <- 10
n.sectors <- 2
K <- n.sectors
ISCorrMatrix <- matrix(data=0.5, nrow=n.sectors, ncol=n.sectors, byrow=TRUE)
diag(ISCorrMatrix) <- 1
Alpha.Matrix <- t(chol(ISCorrMatrix))
Alpha.Matrix.extended <- NULL
for (i in 1:n.sectors) {
new <- matrix(data=Alpha.Matrix[i, ], nrow=nport.total/2, ncol=n.sectors,
byrow=TRUE)
Alpha.Matrix.extended <- rbind(Alpha.Matrix.extended, new)
}
beta.star <- runif(n.sectors, min=0.1, max=0.9) ## so just draw them at random.
beta.star <- beta.star/sqrt(sum(beta.star^2)) ## to normalise the draw.
sum(beta.star^2)
## Initialize parallel backend
num_cores <- length(nsim.runs)
library(doParallel)
cl <- makeCluster(num_cores)
registerDoParallel(cl)
## Set up a reproducible RNG stream
clusterSetRNGStream(cl)
parallel_results <- foreach(core=1:length(nsim.runs), .combine="c") %dopar% {
nsim <- nsim.runs[core]
results <- list()
set.seed(123)
for (run in 1:nruns) {
Z.systematic <- matrix(data=rnorm(nsim*K), nrow=K, ncol=nsim, byrow=FALSE)
Y.systematic <- Alpha.Matrix %*% Z.systematic
Y.systematic.star <- beta.star %*% Z.systematic
Y.systematic.extended <- c()
for (i in 1:n.sectors) {
new <- matrix(data=rep(Y.systematic[i, ], nport.total/2),
nrow=nport.total/2, ncol=nsim, byrow=TRUE)
Y.systematic.extended <- rbind(Y.systematic.extended, new)
}
Y.systematic.extended.star <-
matrix(data=Y.systematic.star, nrow=nport.total, ncol=nsim, byrow=TRUE)
set.seed(123 + run*100000)
Z.Default <- matrix(data=rnorm(nsim*nport.total), nrow=nport.total,
ncol=nsim, byrow=FALSE)
set.seed(123 + run*100000)
LGD.Basel <- matrix(data=rbeta(n=nsim*nport.total, shape1=2, shape2=3),
nrow=nport.total, ncol=nsim, byrow=FALSE)
results[[paste0("run_", run, "_Nsim_", nsim)]] <-
list(Y.systematic=Y.systematic.extended.star,
Z.Default=Z.Default,
LGD=LGD.Basel)
}
return(results) ## The return call of the parallelisation.
}
## Stop the parallel backend
stopCluster(cl)
If we focus on one run across the cores, I want that:
This methodology aims to replicate a situation in which you first simulated a large matrix of 10000 Monte Carlo simulations, and then assigned the first 5000 columns to the first core, the full 10000 columns to the second core, and so on (if there are more simulations).
If we focus on multiple run(s) across the same core, I want that if I rerun its nsim, to get a newly generated random matrix, independent of the one generated in the previous run, and so on.
Visually, when the above methodology is applied to compute one quantile, per run, across a series of cores each having a different number of nsim this yields:

My code seems to achieve that (here I show only Y and Z, but also works for the LGD):
Stability for same run across nsims: Y
> parallel_results$run_1_Nsim_5000$Y.systematic[1:2,4998:5000]
[,1] [,2] [,3]
[1,] -0.2397531 1.548914 0.2781637
[2,] -0.2397531 1.548914 0.2781637
> parallel_results$run_1_Nsim_10000$Y.systematic[1:2,4998:5002]
[,1] [,2] [,3] [,4] [,5]
[1,] -0.2397531 1.548914 0.2781637 2.23994 -0.6209843
[2,] -0.2397531 1.548914 0.2781637 2.23994 -0.6209843
Variability for different runs across same nsim: Y
> parallel_results$run_1_Nsim_5000$Y.systematic[1:2,1:2]
[,1] [,2]
[1,] -0.3996827 0.2019272
[2,] -0.3996827 0.2019272
> parallel_results$run_2_Nsim_5000$Y.systematic[1:2,1:2]
[,1] [,2]
[1,] -0.4580132 -0.08322681
[2,] -0.4580132 -0.08322681
Stability for same run across nsims: Z
> parallel_results$run_1_Nsim_5000$Z.Default[1:2,4998:5000]
[,1] [,2] [,3]
[1,] -1.077092 -0.006225626 -0.6152041
[2,] 2.033250 -1.350659878 -0.7040274
> parallel_results$run_1_Nsim_10000$Z.Default[1:2,4998:5002]
[,1] [,2] [,3] [,4] [,5]
[1,] -1.077092 -0.006225626 -0.6152041 0.57472772 2.593232
[2,] 2.033250 -1.350659878 -0.7040274 -0.08696669 -1.456689
Variability for different runs across same nsim: Z
> parallel_results$run_1_Nsim_5000$Z.Default[1:2,1:2]
[,1] [,2]
[1,] 0.0008058532 -1.0539683
[2,] -1.3029004815 -0.2031874
> parallel_results$run_2_Nsim_5000$Z.Default[1:2,1:2]
[,1] [,2]
[1,] 0.50913790 -1.120769
[2,] -0.03775884 0.894895
Question:
My question is now, are the nruns draws in the same core truly independent (I have checked the summary stats and the corr, they seem to be...)? Or have I inserted some undesired link through the set.seed() setting?
Are there considerations linked to the RNG in R (and its interaction with the parallel package) I have missed? I have extensively searched across stack but was unable to find something matching my application.
Not sure if I understand you correctly, but you're doing something like this,
> library(doParallel)
> cl <- makeCluster(num_cores)
> registerDoParallel(cl)
>
> foreach(core=1:3, .combine="list") %dopar% {
+ rn <- array(, c(2, 2))
+ for (run in seq_len(nrow(rn))) {
+ set.seed(41 + run)
+ rn[, run] <- rnorm(2)
+ }
+ rn
+ }
[[1]]
[[1]][[1]]
[,1] [,2]
[1,] 1.3709584 -0.03751376
[2,] -0.5646982 -1.57460441
[[1]][[2]]
[,1] [,2]
[1,] 1.3709584 -0.03751376
[2,] -0.5646982 -1.57460441
[[2]]
[,1] [,2]
[1,] 1.3709584 -0.03751376
[2,] -0.5646982 -1.57460441
which yields similar values. If that's not your intention, you might want to add the iteration i to the seed, though it's not ideal since random seeds in general are actually pseudo-random.
>
> stopCluster(cl)
>
> library(parallel)
> cl <- makeCluster(num_cores)
>
> parLapply(cl, 1:3, \(i) {
+ rn <- array(, c(2, 2))
+ for (run in seq_len(nrow(rn))) {
+ set.seed(41 + run + i)
+ rn[, run] <- rnorm(2)
+ }
+ rn
+ })
[[1]]
[,1] [,2]
[1,] -0.03751376 0.65391826
[2,] -1.57460441 0.01905227
[[2]]
[,1] [,2]
[1,] 0.65391826 0.3407997
[2,] 0.01905227 -0.7033403
[[3]]
[,1] [,2]
[1,] 0.3407997 -0.8989165
[2,] -0.7033403 0.2121311
>
> stopCluster(cl)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With