Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I vectorize the loop in this sapply call?

Tags:

r

I have found that the most expensive part of my R code is the following sapply call:

L <- 2000
score <- sample(1:3, L, replace = TRUE)
d <- c(0, -1, 0.5)
sapply(1:L, function(i) sum(d[1:score[i]]))

That call takes the sum of the vector d from index 1 to index score[i], looping over each element in the variable score. The challenge is that this code is evaluated as part of an optimization routine and run many, many times.

I am trying to perform the same computation in a vectorized way, but struggling a bit. I suppose that I could create a matrix like this:

d.mat <- matrix(rep(d, L), nrow = L, byrow = TRUE)

then somehow compute rowSums(d.mat) but from column 1 to column score[i] in row i. Is anyone aware of a way to do that without looping? I imagine that that would be much faster than sapply, if possible at all, given the relative speed of rowSums in the following benchmark:

library(microbenchmark)
microbenchmark(sapply(1:L, function(i) sum(d[1:score[i]])), 
               rowSums(d.mat),
               times = 100)

Or perhaps someone sees a better third option.

like image 216
dhc Avatar asked Oct 11 '25 10:10

dhc


1 Answers

Index the cumsum:

microbenchmark::microbenchmark(
  sapply = sapply(1:L, function(i) sum(d[1:score[i]])),
  index = cumsum(d)[score],
  check = "equal"
)
#> Unit: microseconds
#>    expr    min      lq     mean  median     uq    max neval
#>  sapply 2494.8 2698.00 3232.279 2805.35 3516.2 6868.4   100
#>   index    4.3    5.05    8.682    6.90    8.9   60.2   100
like image 87
jblood94 Avatar answered Oct 13 '25 23:10

jblood94