Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R attribute ".Environment" consuming large amounts of RAM in nnet package

Tags:

r

attributes

nnet

I have a piece of code that that is using the nnet package and I am interested in calculating a number of different neural network models & then saving all the models to disk (with save() ).

The issue that I am running into is that the "terms" elements in the neural network has an attribute ".Environment" that ends up being hundreds of megabytes whereas the rest of the model is only a few kilobytes. (once the fitted values & residuals are deleted)

Further, deleting the ".Environment" attribute doesn't appear to cause a problem in terms of using the model with 'predict'.

Does anyone have any idea what either R or nnet is doing with this attribute? Has anyone seen anything like this?

like image 515
chuck taylor Avatar asked Oct 21 '25 15:10

chuck taylor


1 Answers

tl;dr: this is OK, except for some very special cases

Background

The .Environment attribute in R contains a reference to the context in which an R closure (usually a formula or a function) was defined. An R environment is a store holding values of variables, similarly to a list. This allows the formula to refer to these variables, for example:

> f = function(g) return(y ~ g(x))
> form = f(exp)
> lm(form, list(y=1:10, x=log(1:10)))
...
Coefficients:
(Intercept)     g(x)
3.37e-15        1.00e+00

In this example, the formula form if defined as y~exp(x), by giving g the value of exp. In order to be able to find the value of g (which is an argument to function f), the formula needs to hold a reference to the environment constructed inside the call to function f.

You can see the enviroment attached to a formula by using the attributes() or environment() functions as follows:

> attributes(form)
$class
[1] "formula"

$.Environment
<environment: R_GlobalEnv>

> environment(form)
<environment: R_GlobalEnv>

Your question

I believe you are using the nnet() function variant with a formula (rather than matrices), i.e.

> nnet(y ~ x1 + x2, ...)

Unfortunately, R keeps the entire environment (including all the variables defined where your formula is defined) allocated, even if your formula does not refer to any of it. There is no way to the language to easily tell what you may or may not be using from the environment.

One solution is to explicitly retain only the required parts of the environment. In particular, if your formula does not refer to anything in the environment (which is the most common case), it is safe to remove it.

I would suggest removing the environment from your formula before you call nnet, something like this:

    form = y~x + z
    environment(form) = NULL
    ...
    result = nnet(form, ...)
like image 158
Jerzy Avatar answered Oct 23 '25 05:10

Jerzy



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!