Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Alter data.frame object in foreach dopar loop

Tags:

r

I need to parallelize some R code and I am planning on using the foreach library using the %dopar% function.

I want to add columns to a data.frame, I also don't want foreach to print out the result after finishing the loop.

Note that I am not asking the best way to do this, the sample code is just to demonstrate what I am trying to do.

I have tried the code in the sample below, but it does not update the data.frame and secondly it returns the printed out data.frame.

library(foreach)
library(doParallel)

cl<-makeCluster(8)
registerDoParallel(cl)

data <- iris

foreach(i=1:(ncol(data)- 1)) %dopar% {
  data[,paste0(names(data),"_1")] <- data[,i + 1]
}

I expect the output of the loop to add a new column to the data.frame at every iteration and return the data.frame with 4 more columns.

like image 635
PandaMan Avatar asked Dec 20 '25 19:12

PandaMan


1 Answers

You can't alter the original dataframe because the data is passed to each node separately and each node works in a different environment.

Instead, create a new dataframe based on the original one.

library(foreach)
library(doParallel)

cl<-makeCluster(8)
registerDoParallel(cl)

data <- iris

result <- foreach(i=1:(ncol(data)- 1),
        .init = data,
        .combine = cbind) %dopar% {
    out <- data[,i + 1,drop = FALSE]
    colnames(out) <- paste0(colnames(out),"_1")
    return(out)
        } 

like image 128
yusuzech Avatar answered Dec 23 '25 08:12

yusuzech



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!