I have found that the most expensive part of my R code is the following sapply
call:
L <- 2000
score <- sample(1:3, L, replace = TRUE)
d <- c(0, -1, 0.5)
sapply(1:L, function(i) sum(d[1:score[i]]))
That call takes the sum of the vector d
from index 1 to index score[i]
, looping over each element in the variable score
. The challenge is that this code is evaluated as part of an optimization routine and run many, many times.
I am trying to perform the same computation in a vectorized way, but struggling a bit. I suppose that I could create a matrix like this:
d.mat <- matrix(rep(d, L), nrow = L, byrow = TRUE)
then somehow compute rowSums(d.mat)
but from column 1 to column score[i]
in row i
. Is anyone aware of a way to do that without looping? I imagine that that would be much faster than sapply
, if possible at all, given the relative speed of rowSums
in the following benchmark:
library(microbenchmark)
microbenchmark(sapply(1:L, function(i) sum(d[1:score[i]])),
rowSums(d.mat),
times = 100)
Or perhaps someone sees a better third option.
Index the cumsum
:
microbenchmark::microbenchmark(
sapply = sapply(1:L, function(i) sum(d[1:score[i]])),
index = cumsum(d)[score],
check = "equal"
)
#> Unit: microseconds
#> expr min lq mean median uq max neval
#> sapply 2494.8 2698.00 3232.279 2805.35 3516.2 6868.4 100
#> index 4.3 5.05 8.682 6.90 8.9 60.2 100
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With