Shared memory in parallel foreach using set.seed

Question

As seen in this question, in Windows it is not possible to run parallel processes with shared memory in R. Therefore, I have devised the following methodology, using a series of set.seed(), to simulate a "shared memory" between my cores.

First, I want to simulate a random variable (RV) matrix:

The lines represent the fixed number of loans.
The columns represent the number of Monte Calro simulations.

Second, I want to repeat the above a certain number of times, precisely nruns times. At each run in the nruns any metric (mean, quantile) can be computed from this matrix.

nruns <- 4
nsim.runs <- c(5000, 10000)
nport.total <- 10
n.sectors <- 2
K <- n.sectors

ISCorrMatrix <- matrix(data=0.5, nrow=n.sectors, ncol=n.sectors, byrow=TRUE)
diag(ISCorrMatrix) <- 1

Alpha.Matrix <- t(chol(ISCorrMatrix))

Alpha.Matrix.extended <- NULL

for (i in 1:n.sectors) {
  new <- matrix(data=Alpha.Matrix[i, ], nrow=nport.total/2, ncol=n.sectors, 
                byrow=TRUE)
  Alpha.Matrix.extended <- rbind(Alpha.Matrix.extended, new)
}

beta.star <- runif(n.sectors, min=0.1, max=0.9)  ## so just draw them at random.
beta.star <- beta.star/sqrt(sum(beta.star^2))  ## to normalise the draw.

sum(beta.star^2)

## Initialize parallel backend
num_cores <- length(nsim.runs)
library(doParallel)
cl <- makeCluster(num_cores)
registerDoParallel(cl)

## Set up a reproducible RNG stream
clusterSetRNGStream(cl)

parallel_results <- foreach(core=1:length(nsim.runs), .combine="c") %dopar% {
  nsim <- nsim.runs[core]
  results <- list()
  set.seed(123)
  for (run in 1:nruns) { 
    Z.systematic <- matrix(data=rnorm(nsim*K), nrow=K, ncol=nsim, byrow=FALSE)
    Y.systematic <- Alpha.Matrix %*% Z.systematic
    Y.systematic.star <- beta.star %*% Z.systematic
    Y.systematic.extended <- c()
    for (i in 1:n.sectors) {
      new <- matrix(data=rep(Y.systematic[i, ], nport.total/2), 
                    nrow=nport.total/2, ncol=nsim, byrow=TRUE)
      Y.systematic.extended <- rbind(Y.systematic.extended, new)
    }
    Y.systematic.extended.star <- 
      matrix(data=Y.systematic.star, nrow=nport.total, ncol=nsim, byrow=TRUE)
    set.seed(123 + run*100000) 
    Z.Default <- matrix(data=rnorm(nsim*nport.total), nrow=nport.total,
                        ncol=nsim, byrow=FALSE) 
    set.seed(123 + run*100000)
    LGD.Basel <- matrix(data=rbeta(n=nsim*nport.total, shape1=2, shape2=3),
                        nrow=nport.total, ncol=nsim, byrow=FALSE)
    results[[paste0("run_", run, "_Nsim_", nsim)]] <- 
      list(Y.systematic=Y.systematic.extended.star, 
           Z.Default=Z.Default, 
           LGD=LGD.Basel)
  }
  return(results)  ## The return call of the parallelisation.
}

## Stop the parallel backend
stopCluster(cl)

If we focus on one run across the cores, I want that:

The first core simulates 5000 columns (Monte Carlo simulations).
The second core simulates 10000 columns (Monte Carlo simulations). To avoid simulation noise, the first 5000 columns shall contain the same simulated RV as in the first core, and so on, as more cores are added to the parallelisation.

This methodology aims to replicate a situation in which you first simulated a large matrix of 10000 Monte Carlo simulations, and then assigned the first 5000 columns to the first core, the full 10000 columns to the second core, and so on (if there are more simulations).

If we focus on multiple run(s) across the same core, I want that if I rerun its nsim, to get a newly generated random matrix, independent of the one generated in the previous run, and so on.

Visually, when the above methodology is applied to compute one quantile, per run, across a series of cores each having a different number of nsim this yields:

enter image description here

My code seems to achieve that (here I show only Y and Z, but also works for the LGD):

Stability for same run across nsims: Y

> parallel_results$run_1_Nsim_5000$Y.systematic[1:2,4998:5000]
           [,1]     [,2]      [,3]
[1,] -0.2397531 1.548914 0.2781637
[2,] -0.2397531 1.548914 0.2781637
> parallel_results$run_1_Nsim_10000$Y.systematic[1:2,4998:5002]
           [,1]     [,2]      [,3]    [,4]       [,5]
[1,] -0.2397531 1.548914 0.2781637 2.23994 -0.6209843
[2,] -0.2397531 1.548914 0.2781637 2.23994 -0.6209843

Variability for different runs across same nsim: Y

> parallel_results$run_1_Nsim_5000$Y.systematic[1:2,1:2]
           [,1]      [,2]
[1,] -0.3996827 0.2019272
[2,] -0.3996827 0.2019272
> parallel_results$run_2_Nsim_5000$Y.systematic[1:2,1:2]
           [,1]        [,2]
[1,] -0.4580132 -0.08322681
[2,] -0.4580132 -0.08322681

Stability for same run across nsims: Z

> parallel_results$run_1_Nsim_5000$Z.Default[1:2,4998:5000]
          [,1]         [,2]       [,3]
[1,] -1.077092 -0.006225626 -0.6152041
[2,]  2.033250 -1.350659878 -0.7040274
> parallel_results$run_1_Nsim_10000$Z.Default[1:2,4998:5002]
          [,1]         [,2]       [,3]        [,4]      [,5]
[1,] -1.077092 -0.006225626 -0.6152041  0.57472772  2.593232
[2,]  2.033250 -1.350659878 -0.7040274 -0.08696669 -1.456689

Variability for different runs across same nsim: Z

> parallel_results$run_1_Nsim_5000$Z.Default[1:2,1:2]
              [,1]       [,2]
[1,]  0.0008058532 -1.0539683
[2,] -1.3029004815 -0.2031874
> parallel_results$run_2_Nsim_5000$Z.Default[1:2,1:2]
            [,1]      [,2]
[1,]  0.50913790 -1.120769
[2,] -0.03775884  0.894895

Question: My question is now, are the nruns draws in the same core truly independent (I have checked the summary stats and the corr, they seem to be...)? Or have I inserted some undesired link through the set.seed() setting?

Are there considerations linked to the RNG in R (and its interaction with the parallel package) I have missed? I have extensively searched across stack but was unable to find something matching my application.

jay.sf · Accepted Answer

Not sure if I understand you correctly, but you're doing something like this,

> library(doParallel)
> cl <- makeCluster(num_cores)
> registerDoParallel(cl)
> 
> foreach(core=1:3, .combine="list") %dopar% {
+   rn <- array(, c(2, 2))
+   for (run in seq_len(nrow(rn))) {
+     set.seed(41 + run)
+     rn[, run] <- rnorm(2)
+   }
+   rn
+ }
[[1]]
[[1]][[1]]
           [,1]        [,2]
[1,]  1.3709584 -0.03751376
[2,] -0.5646982 -1.57460441

[[1]][[2]]
           [,1]        [,2]
[1,]  1.3709584 -0.03751376
[2,] -0.5646982 -1.57460441


[[2]]
           [,1]        [,2]
[1,]  1.3709584 -0.03751376
[2,] -0.5646982 -1.57460441

which yields similar values. If that's not your intention, you might want to add the iteration i to the seed, though it's not ideal since random seeds in general are actually pseudo-random.

> 
> stopCluster(cl)
> 
> library(parallel)
> cl <- makeCluster(num_cores)
> 
> parLapply(cl, 1:3, \(i) {
+   rn <- array(, c(2, 2))
+   for (run in seq_len(nrow(rn))) {
+     set.seed(41 + run + i)
+     rn[, run] <- rnorm(2)
+   }
+   rn
+ })
[[1]]
            [,1]       [,2]
[1,] -0.03751376 0.65391826
[2,] -1.57460441 0.01905227

[[2]]
           [,1]       [,2]
[1,] 0.65391826  0.3407997
[2,] 0.01905227 -0.7033403

[[3]]
           [,1]       [,2]
[1,]  0.3407997 -0.8989165
[2,] -0.7033403  0.2121311

> 
> stopCluster(cl)

Shared memory in parallel foreach using set.seed

Tags:

foreach

r

parallel-processing

shared-memory

doparallel

BMBE

1 Answers

jay.sf

Recent Activity

Donate For Us

Shared memory in parallel foreach using set.seed

Tags:

foreach

r

parallel-processing

shared-memory

doparallel

BMBE

1 Answers

jay.sf

Related questions

Recent Activity

Donate For Us