Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parallelize user-defined function using apply family in R

I have a script that takes too long to compute and I'm trying to paralellize its execution.

The script basically loops through each row of a data frame and perform some calculations as shown below:

my.df = data.frame(id=1:9,value=11:19)

sumPrevious <- function(df,df.id){
    sum(df[df$id<=df.id,"value"])
}

for(i in 1:nrow(my.df)){
    print(sumPrevious(my.df,my.df[i,"id"]))
}

I'm starting to learn to parallelize code in R, this is why I first want to understand how I could do this with an apply-like function (e.g. sapply,lapply,mapply).

I've tried multiple things but nothing worked so far:

mapply(sumPrevious,my.df,my.df$id) # Error in df$id : $ operator is invalid for atomic vectors
like image 977
Victor Avatar asked Oct 15 '25 14:10

Victor


1 Answers

Using theparallel package in R you can use the mclapply() function. You will need to adjust your code a little bit to make it run in parallel.

library(parallel)
my.df = data.frame(id=1:9,value=11:19)

sumPrevious <- function(i,df){df.id = df$id[i]
    sum(df[df$id<=df.id,"value"])
}

mclapply(X = 1:nrow(my.df),FUN = sumPrevious,my.df,mc.preschedule = T,mc.cores = no.of.cores)

This code will run the sumPrevious in parallel on no.of.cores in your machine.

like image 52
tushaR Avatar answered Oct 17 '25 03:10

tushaR



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!