I am currently using the parallel
package in R and I am trying to make by work reproducible by setting seeds.
However, if you set the seed before creating the cluster and performing the tasks you want in parallel, for some reason, it doesn't make it reproducible. I think I need to set the seed for each core when I make the cluster.
I have made a small example here to illustrate my problem:
library(parallel)
# function to generate 2 uniform random numbers
runif_parallel <- function() {
# make cluster of two cores
cl <- parallel::makeCluster(2)
# sample uniform random numbers
samples <- parallel::parLapplyLB(cl, X = 1:2, fun = function(i) runif(1))
# close cluster
parallel::stopCluster(cl)
return(unlist(samples))
}
set.seed(41)
test1 <- runif_parallel()
set.seed(41)
test2 <- runif_parallel()
# they should be the same since they have the same seed
identical(test1, test2)
In this example, the test1
and test2
should be the same, as they have the same seed, but they return different results.
Can I get some help with where I'm going wrong please?
Note that I've written this example the way I have to mimic how I'm using it right now - there are probably cleaner ways to generate two random uniform numbers in parallel.
You need to run set.seed
within each job.
Here is a reproducable random generation:
cl <- parallel::makeCluster(2)
# sample uniform random numbers
parallel::clusterEvalQ(cl, set.seed(41));
samples <- parallel::parLapplyLB(cl, X = 1:2, fun = function(i){set.seed(i);runif(1)})
samples
# [[1]]
# [1] 0.2655087
#
# [[2]]
# [1] 0.1848823
samples <- parallel::parLapplyLB(cl, X = 1:2, fun = function(i){set.seed(i);runif(1)})
samples
# [[1]]
# [1] 0.2655087
#
# [[2]]
# [1] 0.1848823
parallel::stopCluster(cl)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With