Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to speed up `expand.grid()` in R?

Tags:

performance

r

I'm trying to speed up the creation of table with all possible combinations between two vectors. We can get this functionality from base R when we use expand.grid(). However, I was wondering whether we can accomplish the same result, but faster, using tools from {collapse} package.

There has been a StackOverflow thread about this topic here. But even if we take the fastest solution provided there it is somewhat slowest in the following case. Although tidyr::expand_grid() is speedier than base R, I still hope that utilizing collapse package we can get faster processing times.

#library(collapse)
#library(tidyr)
library(babynames)

year  <- collapse::funique(babynames$year, sort = TRUE)
names <- collapse::funique(babynames$name)

expand.grid.jc <- function(seq1,seq2) { ## from https://stackoverflow.com/a/10407457/6105259
  as.data.frame(cbind(Var1 = rep.int(seq1, length(seq2)), 
                      Var2 = rep.int(seq2, rep.int(length(seq1),length(seq2)))))
}

my_benchmarking <- 
  bench::mark(base = expand.grid(year, names),
              jc = expand.grid.jc(year, names),
              tidyr = tidyr::expand_grid(year, names), check = FALSE, iterations = 10)
#> Warning: Some expressions had a GC in every iteration; so filtering is disabled.

my_benchmarking
#> # A tibble: 3 x 6
#>   expression      min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 base        965.3ms    1.06s    0.938      701MB    2.35 
#> 2 jc            13.1s   13.39s    0.0747     820MB    0.120
#> 3 tidyr       541.2ms 656.71ms    1.55       316MB    1.24

Created on 2021-08-22 by the reprex package (v2.0.0)

Would be happy to learn whether this task could possibly be computed faster.

like image 449
Emman Avatar asked Oct 21 '25 13:10

Emman


1 Answers

You may try data.table::CJ function.

bench::mark(base = expand.grid(year, names),
            jc = expand.grid.jc(year, names),
            tidyr1 = tidyr::expand_grid(year, names), 
            tidyr2 = tidyr::crossing(year, names), 
            dt = data.table::CJ(year, names),
            check = FALSE, iterations = 10)

#  expression      min   median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc total_time result memory  time   gc   
#  <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl> <int> <dbl>   <bch:tm> <list> <list>  <list> <lis>
#1 base       635.48ms 715.02ms     1.25      699MB    2.00     10    16      8.02s <NULL> <Rprof… <benc… <tib…
#2 jc            5.66s    5.76s     0.172     820MB    0.275    10    16     58.13s <NULL> <Rprof… <benc… <tib…
#3 tidyr1     195.03ms 268.97ms     4.01      308MB    2.00     10     5       2.5s <NULL> <Rprof… <benc… <tib…
#4 tidyr2     590.91ms 748.35ms     1.31      312MB    0.656    10     5      7.62s <NULL> <Rprof… <benc… <tib…
#5 dt          318.1ms 384.21ms     2.47      206MB    0.986    10     4      4.06s <NULL> <Rprof… <benc… <tib…

PS - Also included tidyr::crossing for comparison as it does the same thing.

like image 96
Ronak Shah Avatar answered Oct 23 '25 05:10

Ronak Shah



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!