I'm trying to speed up the creation of table with all possible combinations between two vectors. We can get this functionality from base R when we use expand.grid(). However, I was wondering whether we can accomplish the same result, but faster, using tools from {collapse} package.
There has been a StackOverflow thread about this topic here. But even if we take the fastest solution provided there it is somewhat slowest in the following case. Although tidyr::expand_grid() is speedier than base R, I still hope that utilizing collapse package we can get faster processing times.
#library(collapse)
#library(tidyr)
library(babynames)
year <- collapse::funique(babynames$year, sort = TRUE)
names <- collapse::funique(babynames$name)
expand.grid.jc <- function(seq1,seq2) { ## from https://stackoverflow.com/a/10407457/6105259
as.data.frame(cbind(Var1 = rep.int(seq1, length(seq2)),
Var2 = rep.int(seq2, rep.int(length(seq1),length(seq2)))))
}
my_benchmarking <-
bench::mark(base = expand.grid(year, names),
jc = expand.grid.jc(year, names),
tidyr = tidyr::expand_grid(year, names), check = FALSE, iterations = 10)
#> Warning: Some expressions had a GC in every iteration; so filtering is disabled.
my_benchmarking
#> # A tibble: 3 x 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 base 965.3ms 1.06s 0.938 701MB 2.35
#> 2 jc 13.1s 13.39s 0.0747 820MB 0.120
#> 3 tidyr 541.2ms 656.71ms 1.55 316MB 1.24
Created on 2021-08-22 by the reprex package (v2.0.0)
Would be happy to learn whether this task could possibly be computed faster.
You may try data.table::CJ function.
bench::mark(base = expand.grid(year, names),
jc = expand.grid.jc(year, names),
tidyr1 = tidyr::expand_grid(year, names),
tidyr2 = tidyr::crossing(year, names),
dt = data.table::CJ(year, names),
check = FALSE, iterations = 10)
# expression min median `itr/sec` mem_alloc `gc/sec` n_itr n_gc total_time result memory time gc
# <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl> <int> <dbl> <bch:tm> <list> <list> <list> <lis>
#1 base 635.48ms 715.02ms 1.25 699MB 2.00 10 16 8.02s <NULL> <Rprof… <benc… <tib…
#2 jc 5.66s 5.76s 0.172 820MB 0.275 10 16 58.13s <NULL> <Rprof… <benc… <tib…
#3 tidyr1 195.03ms 268.97ms 4.01 308MB 2.00 10 5 2.5s <NULL> <Rprof… <benc… <tib…
#4 tidyr2 590.91ms 748.35ms 1.31 312MB 0.656 10 5 7.62s <NULL> <Rprof… <benc… <tib…
#5 dt 318.1ms 384.21ms 2.47 206MB 0.986 10 4 4.06s <NULL> <Rprof… <benc… <tib…
PS - Also included tidyr::crossing for comparison as it does the same thing.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With