I need to parallelize some R code and I am planning on using the foreach library using the %dopar% function.
I want to add columns to a data.frame, I also don't want foreach to print out the result after finishing the loop.
Note that I am not asking the best way to do this, the sample code is just to demonstrate what I am trying to do.
I have tried the code in the sample below, but it does not update the data.frame and secondly it returns the printed out data.frame.
library(foreach)
library(doParallel)
cl<-makeCluster(8)
registerDoParallel(cl)
data <- iris
foreach(i=1:(ncol(data)- 1)) %dopar% {
data[,paste0(names(data),"_1")] <- data[,i + 1]
}
I expect the output of the loop to add a new column to the data.frame at every iteration and return the data.frame with 4 more columns.
You can't alter the original dataframe because the data is passed to each node separately and each node works in a different environment.
Instead, create a new dataframe based on the original one.
library(foreach)
library(doParallel)
cl<-makeCluster(8)
registerDoParallel(cl)
data <- iris
result <- foreach(i=1:(ncol(data)- 1),
.init = data,
.combine = cbind) %dopar% {
out <- data[,i + 1,drop = FALSE]
colnames(out) <- paste0(colnames(out),"_1")
return(out)
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With