I have a script that takes too long to compute and I'm trying to paralellize its execution.
The script basically loops through each row of a data frame and perform some calculations as shown below:
my.df = data.frame(id=1:9,value=11:19)
sumPrevious <- function(df,df.id){
sum(df[df$id<=df.id,"value"])
}
for(i in 1:nrow(my.df)){
print(sumPrevious(my.df,my.df[i,"id"]))
}
I'm starting to learn to parallelize code in R, this is why I first want to understand how I could do this with an apply-like function (e.g. sapply,lapply,mapply).
I've tried multiple things but nothing worked so far:
mapply(sumPrevious,my.df,my.df$id) # Error in df$id : $ operator is invalid for atomic vectors
Using theparallel
package in R you can use the mclapply()
function. You will need to adjust your code a little bit to make it run in parallel.
library(parallel)
my.df = data.frame(id=1:9,value=11:19)
sumPrevious <- function(i,df){df.id = df$id[i]
sum(df[df$id<=df.id,"value"])
}
mclapply(X = 1:nrow(my.df),FUN = sumPrevious,my.df,mc.preschedule = T,mc.cores = no.of.cores)
This code will run the sumPrevious in parallel on no.of.cores
in your machine.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With