I hope someone will be able to help me on this problem. I have a list object that includes 48 vectors and each vector has a length of 2,000,000 observations in it. Here is a code that creates the same structure with only 100,000 items per vector:
mtx_sim <- matrix(data = runif(48 * 100000), ncol = 48, nrow = 100000)
mtx_list <- as.list(data.frame(mtx_sim))
I want to cumulatively sum each row of the vectors in the list. However, there is one stipulation I only want to sum the last thirty vectors. For example, the 35th vector in the list should be added to the 34 preceding vectors. On the other hand, the fourth vector in the list should be added to the three preceding vectors (vector number three, two, and one). Here is my code example that relies on the lapply function in combination with rowSums, which is relatively slow:
start <- c(rep(1, times = 30), seq(2, 19, 1))
end <- seq(1,48,1)
system.time(xxx <- lapply(1:48, function(x)
rowSums(
matrix(
unlist(mtx_list[start[x]:end[x]]),
ncol = (end[x] - start[x] + 1)))
) )
user system elapsed
62.19 0.56 63.04
Does anyone have an idea to optimize the code?
You are doing two expensive things in an otherwise reasonable algorithm:
Here is an alternative. We reconstitute the original matrix once, and then just add the marginal columns.
fun_brodie <- function(mtx_list) {
mtx <- do.call(cbind, mtx_list)
base <- mtx[, 1]
res <- list(base)
for(i in seq(ncol(mtx))[-1])
res[[i]] <- res[[i - 1]] + mtx[, i] - if(i > 30) mtx[, i - 30] else 0
res
}
res <- fun_brodie(mtx_list)
Confirm equal:
all.equal(res, xxx)
# [1] TRUE
Benchmarks:
library(microbenchmark)
microbenchmark(times=3, fun_marat(mtx_list), fun_brodie(mtx_list), fun_op(mtx_list))
Produces:
Unit: milliseconds
expr min lq mean
fun_marat(mtx_list) 1661.9135 1763.418 1800.3530
fun_brodie(mtx_list) 115.7877 116.061 153.6794
fun_op(mtx_list) 58059.7803 60388.303 62060.5557
Thanks to Marat for pointing out an interpretation error on my part. Also, note that in order to make fun_marat compartible I added a step of cbinding the list to a data frame.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With