Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is my implementation of rgeom faster than R's?

Tags:

performance

r

I implemented my_rgeom function using that if X ~ Geom(p) and Y ~ Exp(-log(1 − p)) then Z=ceiling(Y) has same distribution as X. Using inverse-transform method I can now define my function as:

my_rgeom <- function(N, p) {
    U <- runif(N)
    Y <- log(U) / log(1 - p)
    
    return(ceiling(Y))
}

Why is R's implementation not using for example inverse-transform since its much faster?

Results of benchmarking this vs the standard rgeom function:

library(microbenchmark)
library(ggplot2)
library(gridExtra)

N <- 10^6
prob <- c(0.1, 0.2, 0.3, 0.4, 0.5) 

results_list <- list()

for (p in prob) {
    benchmark_results <- microbenchmark(
        my_rgeom(N, p),
        rgeom(N, p),
        times = 100  
    )
    
    # Store the results in the list
    results_list[[as.character(p)]] <- benchmark_results
}
# Plot the results for each value of p
plots <- lapply(prob, function(p) {
    autoplot(results_list[[as.character(p)]], title = paste("p =", p))
})

# Display the plots
gridExtra::grid.arrange(grobs = plots)

Results of code above

And output of results_list:

results_list 
$`0.1`
Unit: milliseconds
           expr      min       lq      mean    median       uq      max neval
 my_rgeom(N, p)  63.2907  69.1584  76.95829  73.62385  78.8459 172.4273   100
    rgeom(N, p) 115.0211 123.7101 131.01782 127.05655 134.2484 191.6843   100

$`0.2`
Unit: milliseconds
           expr      min       lq      mean    median        uq      max neval
 my_rgeom(N, p)  63.4618  76.8853  87.02115  85.91125  94.29075 195.5412   100
    rgeom(N, p) 117.3709 130.1152 142.02507 135.24175 148.68975 274.5087   100

$`0.3`
Unit: milliseconds
           expr      min        lq      mean   median       uq      max neval
 my_rgeom(N, p)  66.2584  77.31255  85.50857  81.4676  88.8424 178.2553   100
    rgeom(N, p) 115.3894 129.59360 139.56611 135.2741 146.1852 187.7755   100

$`0.4`
Unit: milliseconds
           expr      min       lq      mean   median       uq      max neval
 my_rgeom(N, p)  68.1903  75.1489  83.53151  80.1580  86.8057 180.0446   100
    rgeom(N, p) 116.3780 121.7696 132.27215 126.9434 137.8039 213.7012   100

$`0.5`
Unit: milliseconds
           expr      min        lq      mean   median       uq      max neval
 my_rgeom(N, p)  70.0728  73.63675  82.20571  79.3234  84.6281 178.6583   100
    rgeom(N, p) 116.5077 119.13275 126.49830 124.5081 130.1617 187.6423   100
like image 262
ScapeProf Avatar asked Sep 07 '25 17:09

ScapeProf


2 Answers

The R implementation of rgeom appears to be treating the geometric distribution as a special case of the negative binomial distribution, which relies on Poisson random variates. I can see no reason for not using inverse transform sampling, as it will be faster and can be made to be just as numerically stable as rgeom. However, I would suggest a couple modifications to my_rgeom.

First, -rexp(N) will be faster than log(runif(N)). Second, to better handle very small values of p, log1p(-p) is preferred over log(1 - p), as mentioned in the comments.

As a side note, rgeom has support starting at 0, so I will use floor instead of ceiling.

The following code demonstrates the above.

First define some candidate functions:

set.seed(1845088186)

rgeom1 <- function(N, p) {
  U <- runif(N)
  Y <- log(U) / log(1 - p)
  return(floor(Y))
}

rgeom2 <- function(N, p) {
  floor(-rexp(N)/log(1 - p))
}

rgeom3 <- function(N, p) {
  floor(-rexp(N)/log1p(-p))
}

rgeom_all <- list(rgeom, rgeom1, rgeom2, rgeom3)

Test that they all return the same distribution.

(p <- runif(1))
#> [1] 0.08180412
x <- lapply(rgeom_all, \(f) f(1e6, p))

suppressWarnings(sapply(x[-1], \(y) ks.test(y, x[[1]])$p.value))
#> [1] 0.7088201 0.7734314 0.7856763
mapply(summary, x)
#>              [,1]      [,2]      [,3]      [,4]
#> Min.      0.00000   0.00000   0.00000   0.00000
#> 1st Qu.   3.00000   3.00000   3.00000   3.00000
#> Median    8.00000   8.00000   8.00000   8.00000
#> Mean     11.23687  11.23102  11.21926  11.21025
#> 3rd Qu.  16.00000  16.00000  16.00000  16.00000
#> Max.    203.00000 187.00000 146.00000 171.00000

Compare timings:

p <- runif(1e6)

microbenchmark::microbenchmark(
  rgeom = rgeom(1e6, p),
  rgeom1 = rgeom1(1e6, p),
  rgeom2 = rgeom2(1e6, p),
  rgeom3 = rgeom3(1e6, p)
)
#> Unit: milliseconds
#>    expr     min       lq      mean    median       uq      max neval
#>   rgeom 82.3725 91.00560 100.98802 100.48860 108.7335 146.5782   100
#>  rgeom1 64.3914 74.54830  82.31284  80.25625  87.1611 123.3017   100
#>  rgeom2 55.6838 60.14365  69.41881  66.87925  75.9065 143.6629   100
#>  rgeom3 53.8316 58.36640  65.48733  64.77050  70.4816 102.4296   100

Note that log1p(-p) is slightly faster than log(1 - p).

However, the behavior of rgeom differs from that of the other three implementations when the variates are in the integer range:

sapply(rgeom_all, \(f) typeof(f(1, 0.5)))
#> [1] "integer" "double"  "double"  "double"
sapply(rgeom_all, \(f) typeof(f(1, .Machine$double.xmin)))
#> [1] "double" "double" "double" "double"

They also differ as to how they handle some invalid inputs:

rgeom(1, -0.5)
#> Warning in rgeom(1, -0.5): NAs produced
#> [1] NA
rgeom1(1, -0.5)
#> [1] -2
rgeom2(1, -0.5)
#> [1] -2
rgeom3(1, -0.5)
#> [1] -1

rgeom(1, 1.1)
#> Warning in rgeom(1, 1.1): NAs produced
#> [1] NA
rgeom1(1, 1.1)
#> Warning in log(1 - p): NaNs produced
#> [1] NaN
rgeom2(1, 1.1)
#> Warning in log(1 - p): NaNs produced
#> [1] NaN
rgeom3(1, 1.1)
#> Warning in log1p(-p): NaNs produced
#> [1] NaN

rgeom(1, NA)
#> Warning in rgeom(1, NA): NAs produced
#> [1] NA
rgeom1(1, NA)
#> [1] NA
rgeom2(1, NA)
#> [1] NA
rgeom3(1, NA)
#> [1] NA

The three candidate functions can be modified to mirror* the behavior of rgeom with some impact on performance:

(*Closely enough--fully and faithfully mirroring the behavior of rgeom in all cases will eliminate or even reverse the performance difference without resorting to C code.)

rgeom1 <- function(N, p) {
  if (length(p) == 1L && p[1] < 0) return(rep(NA_integer_, N))
  y <- log(runif(N))/log(1 - p)
  y[p < 0] <- NA_real_
  if (max(y, na.rm = 1) > .Machine$integer.max) floor(y) else as.integer(y)
}

rgeom2 <- function(N, p) {
  if (length(p) == 1L && p[1] < 0) return(rep(NA_integer_, N))
  y <- -rexp(N)/log(1 - p)
  y[p < 0] <- NA_real_
  if (max(y, na.rm = 1) > .Machine$integer.max) floor(y) else as.integer(y)
}

rgeom3 <- function(N, p) {
  if (length(p) == 1L && p[1] < 0) return(rep(NA_integer_, N))
  y <- -rexp(N)/log1p(-p)
  y[p < 0] <- NA_real_
  if (max(y, na.rm = 1) > .Machine$integer.max) floor(y) else as.integer(y)
}

Testing:

rgeom(1, -0.5)
#> Warning in rgeom(1, -0.5): NAs produced
#> [1] NA
rgeom1(1, -0.5)
#> Warning in max(y, na.rm = 1): no non-missing arguments to max; returning -Inf
#> [1] NA
rgeom2(1, -0.5)
#> Warning in max(y, na.rm = 1): no non-missing arguments to max; returning -Inf
#> [1] NA
rgeom3(1, -0.5)
#> Warning in max(y, na.rm = 1): no non-missing arguments to max; returning -Inf
#> [1] NA

rgeom(1, 1.1)
#> Warning in rgeom(1, 1.1): NAs produced
#> [1] NA
rgeom1(1, 1.1)
#> Warning in log(1 - p): NaNs produced
#> Warning in max(y, na.rm = 1): no non-missing arguments to max; returning -Inf
#> [1] NA
rgeom2(1, 1.1)
#> Warning in log(1 - p): NaNs produced

#> Warning in log(1 - p): no non-missing arguments to max; returning -Inf
#> [1] NA
rgeom3(1, 1.1)
#> Warning in log1p(-p): NaNs produced

#> Warning in log1p(-p): no non-missing arguments to max; returning -Inf
#> [1] NA

rgeom(1, NA)
#> Warning in rgeom(1, NA): NAs produced
#> [1] NA
rgeom1(1, NA)
#> Warning in max(y, na.rm = 1): no non-missing arguments to max; returning -Inf
#> [1] NA
rgeom2(1, NA)
#> Warning in max(y, na.rm = 1): no non-missing arguments to max; returning -Inf
#> [1] NA
rgeom3(1, NA)
#> Warning in max(y, na.rm = 1): no non-missing arguments to max; returning -Inf
#> [1] NA

Timing:

p <- runif(1e6, -0.1, 1.1)
p[sample(1e6, 1e3)] <- NA

microbenchmark::microbenchmark(
  rgeom = rgeom(1e6, p),
  rgeom1 = rgeom1(1e6, p),
  rgeom2 = rgeom2(1e6, p),
  rgeom3 = rgeom3(1e6, p)
)
#> Unit: milliseconds
#>    expr     min       lq     mean   median       uq      max neval
#>   rgeom 70.8633 72.26875 74.56052 73.36910 75.60830  88.3045   100
#>  rgeom1 65.1287 69.38690 74.28986 71.11060 73.09685 114.2515   100
#>  rgeom2 58.2231 62.40600 65.61609 64.12870 65.65905  99.9390   100
#>  rgeom3 58.9580 63.21480 68.88956 64.88055 66.62870 116.3178   100

However, for very small values of p, rgeom1 and rgeom2 are not well behaved:

sapply(rgeom_all, \(f) sum(is.finite(f(1e6, .Machine$double.xmin))))
#> Warning in f(1e+06, .Machine$double.xmin): NAs produced
#> [1] 981778      0      0 981709

rgeom3 is well behaved over all values of p in addition to being the fastest of the four implementations.

like image 89
jblood94 Avatar answered Sep 10 '25 08:09

jblood94


@jblood94's answer is great, I'm going to try to address the "why?" part of this, from a historical perspective.

  • the rgeom code is 26 years old (see git blame); in 1997, people were much less likely to be concerned specifically about the time taken to generate 1 million geometric deviates (even today, you need to be picking a lot of deviates before the difference between 75 and 150 milliseconds to pick 1 million deviates becomes important ...)
  • ?rgeom refers to Devroye (1986) Non-Uniform Random Variate Generation p. 480:

‘rgeom’ uses the derivation as an exponential mixture of Poissons, see Devroye ...

p. 480 seems as though it might be a typo (it's the exercises on continuous univariate distributions): maybe this should have been p. 488 (discussing the negative binomial as a mixture of Poissons: the geometric is a special case)?

It's a little weird, and presumably an oversight, because pages 499-500 specifically discuss faster generation of geometric deviates.

If you can make a reasonable case for the value of this improvement (i.e. is it worth R-core's time to pull this method into base R, with the potential backward compatibility issues?), it would be worth discussing on [email protected] ...

like image 39
Ben Bolker Avatar answered Sep 10 '25 09:09

Ben Bolker