Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

When is it worth using `remove` in R functions?

What factors should I consider when deciding whether or not to remove a variable that will not be used again in a function?

Here's a noddy example:

DivideByLower <- function (a, b) {
  if (a > b) {
    tmp <- a
    a <- b
    b <- tmp
    remove(tmp) # When should I include this line?
  }

  # Return:
  a / b
}

I understand that tmp will be removed when the function finishes executing, but should I ever be concerned about removing it earlier?

like image 391
Martin Smith Avatar asked Oct 26 '25 10:10

Martin Smith


1 Answers

From Hadley Wickham's advanced R :

In some languages, you have to explicitly delete unused objects for their memory to be returned. R uses an alternative approach: garbage collection (or GC for short). GC automatically releases memory when an object is no longer used. It does this by tracking how many names point to each object, and when there are no names pointing to an object, it deletes that object.

In the case you're describing garbage collection will release the memory.

In case the output of your function is another function, in which case Hadley names these functions respectively the function factory and the manufactured function, the variables created in the body of the function factory will be available in the enclosing environment of the manufactured function, and memory won't be freed.

More info, still in Hadley's book, can be found in the chapter about function factories.

function_factory <- function(x){
  force(x)
  y <- "bar"
  fun <- function(z){
    sprintf("x, y, and z are all accessible and their values are '%s', '%s', and '%s'",
            x, y, z)
  }
  fun
}

manufactured_function <- function_factory("foo")
manufactured_function("baz")
#> [1] "x, y, and z are all accessible and their values are 'foo', 'bar', and 'baz'"

Created on 2019-07-08 by the reprex package (v0.3.0)

In this case, if you want to control which variables are available in the enclosing environment, or be sure you don't clutter your memory, you might want to remove unnecessary objects, either by using rm / remove as you did, or as I tend to prefer, wrapped in an on.exit statement.

Another case in which I might use rm is if I want to access variables from a parent environment without risk of them being overriden inside of the function, but in that case it's often possible and cleaner to use eval.parent.

y <- 2
z <- 3
test0 <- function(x, var){
  y <- 1
  x + eval(substitute(var))
}

# opps, the value of y is the one defined in the body
test0(0, y)
#> [1] 1
test0(0, z)
#> [1] 3

# but it will work using eval.parent :
test1 <- function(x, var){
  y <- 1
  x + eval.parent(substitute(var))
}
test1(0, y)
#> [1] 2
test1(0, z)
#> [1] 3

# in some cases (better avoided), it can be easier/quick and dirty to do something like :
test2 <- function(x, var){
  y <- 1
  # whatever code using y
  rm(y)
  x + eval(substitute(var))
}
test2(0, y)
#> [1] 2
test2(0, z)
#> [1] 3

Created on 2019-07-08 by the reprex package (v0.3.0)

like image 72
Moody_Mudskipper Avatar answered Oct 29 '25 00:10

Moody_Mudskipper



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!