I have a rather large dataset organized in a list like so:
set.seed(0)
v <- rnorm(5000)
names(v) <- seq(1001, 6000, 1)
dates <- seq.Date(as.Date('2023-01-01'), by='day', length.out=365)
ls <- list()
ls <- sapply(dates, function(d) {ls[[length(ls) + 1]] <- v; ls})
names(ls) <- dates
str(ls[1:5])
List of 5
$ 2023-01-01: Named num [1:5000] 1.263 -0.326 1.33 1.272 0.415 ...
..- attr(*, "names")= chr [1:5000] "1001" "1002" "1003" "1004" ...
$ 2023-01-02: Named num [1:5000] 1.263 -0.326 1.33 1.272 0.415 ...
..- attr(*, "names")= chr [1:5000] "1001" "1002" "1003" "1004" ...
$ 2023-01-03: Named num [1:5000] 1.263 -0.326 1.33 1.272 0.415 ...
..- attr(*, "names")= chr [1:5000] "1001" "1002" "1003" "1004" ...
$ 2023-01-04: Named num [1:5000] 1.263 -0.326 1.33 1.272 0.415 ...
..- attr(*, "names")= chr [1:5000] "1001" "1002" "1003" "1004" ...
$ 2023-01-05: Named num [1:5000] 1.263 -0.326 1.33 1.272 0.415 ...
..- attr(*, "names")= chr [1:5000] "1001" "1002" "1003" "1004" ...
As you can see, this is 5000 data points for each day of the year totaling 1,825,000 data points. Let's call them "x". I want to perform the following operation on EACH data point, x: max(c(0.5 - x, 0)). The following code works but takes a really long time.
new <- sapply(names(ls), function(d)
lapply(names(v), function(n) max(c(0.5 - ls[[d]][n], 0))))
rownames(new) <- names(v)
new[1:5, 1:5]
2023-01-01 2023-01-02 2023-01-03 2023-01-04 2023-01-05
1001 0 0 0 0 0
1002 0.8262334 0.8262334 0.8262334 0.8262334 0.8262334
1003 0 0 0 0 0
1004 0 0 0 0 0
1005 0.08535857 0.08535857 0.08535857 0.08535857 0.08535857
Is there a faster way?
Instead of applying lapply, use pmax
new <- sapply(ls, \(v) pmax(0.5 - v, 0))
You can try:
m <- 0.5 - do.call(cbind, ls)
m[m < 0] <- 0
Which gives:
head(m, c(5,5))
2023-01-01 2023-01-02 2023-01-03 2023-01-04 2023-01-05
1001 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000
1002 0.82623336 0.82623336 0.82623336 0.82623336 0.82623336
1003 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000
1004 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000
1005 0.08535857 0.08535857 0.08535857 0.08535857 0.08535857
Benchmarks:
bench::mark(a = sapply(ls, \(v) pmax(0.5 - v, 0)),
b = {
m <- do.call(cbind, ls)
pmax(0.5 - m, 0)
},
c = {
m <- 0.5 - do.call(cbind, ls)
m[m < 0] <- 0
m
})
# A tibble: 3 × 13
expression min median `itr/sec` mem_alloc `gc/sec` n_itr n_gc total_time result memory time gc
<bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl> <int> <dbl> <bch:tm> <list> <list> <list> <list>
1 a 247.6ms 248ms 4.04 305MB 4.04 1 1 248ms <dbl[…]> <Rprofmem> <bench_tm> <tibble>
2 b 113.1ms 114ms 8.77 229MB 0 5 0 570ms <dbl[…]> <Rprofmem [4 × 3]> <bench_tm> <tibble>
3 c 99.5ms 106ms 9.16 165MB 0 5 0 546ms <dbl[…]> <Rprofmem [5 × 3]> <bench_tm> <tibble>
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With