Assigning multiple environments

Question

Can someone explain this behaviour to me?

a <- b <- c <- new.env()
a$this <- 1
b$this 
# 1
c$this 
# 1

I would have expected that a/b/c would be distinct environments much like a variable created in the same manner?

However, three environments show in the global environment, but any action on one is pushed into all of them.

Tensibai · Accepted Answer

Disclaimer: This answer may not be totally SFW as S-Expressions which is the common type for near all objects in R are abbreviated SEXP (Yep, S-EXPression, the hyphen is not where you thought). Now as SALT 'N' PEPA had sing: let's talk about SEXP !

TL;DR: An environment is stored in it's parent environment as a pointer, copying the variable to access it, just duplicate the pointer and still target the same object.

I did some digging on the root cause, main reason is what is an Environment, or in fact, how it is stored in it's parent environment. Let's see new.env:

> new.env
function (hash = TRUE, parent = parent.frame(), size = 29L) 
.Internal(new.env(hash, parent, size))
<bytecode: 0x0000000005972428>
<environment: namespace:base>

Ok, time to go to source code, in names.c:

{"new.env", do_newenv,  0,  11,     3,      {PP_FUNCALL, PREC_FN,   0}},

Searching do_newenv bring us to builtin.c which return (I took a shortcut here, but let's keep this not too long):

ans = NewEnvironment(R_NilValue, R_NilValue, enclos);

This NewEnvironment is defined here in memory.c and the comments above it give us a clue about what is going on:

Create an environment by extending "rho" with a frame obtained by
pairing the variable names given by the tags on "namelist" with the values given by the elements of "valuelist".

The code on itself is not so easy to follow:

SEXP NewEnvironment(SEXP namelist, SEXP valuelist, SEXP rho)
{
    SEXP v, n, newrho;

    if (FORCE_GC || NO_FREE_NODES()) {
    PROTECT(namelist);
    PROTECT(valuelist);
    PROTECT(rho);
    R_gc_internal(0);
    UNPROTECT(3);
    if (NO_FREE_NODES())
        mem_err_cons();
    }
    GET_FREE_NODE(newrho);
    newrho->sxpinfo = UnmarkedNodeTemplate.sxpinfo;
    INIT_REFCNT(newrho);
    TYPEOF(newrho) = ENVSXP;
    FRAME(newrho) = valuelist;
    ENCLOS(newrho) = CHK(rho);
    HASHTAB(newrho) = R_NilValue;
    ATTRIB(newrho) = R_NilValue;

    v = CHK(valuelist);
    n = CHK(namelist);
    while (v != R_NilValue && n != R_NilValue) {
    SET_TAG(v, TAG(n));
    v = CDR(v);
    n = CDR(n);
    }
    return (newrho);
}

Compared to a variable definition in global environment (exemple choosen for sanity of reader's mind) by gsetVar:

void gsetVar(SEXP symbol, SEXP value, SEXP rho)
{
    if (FRAME_IS_LOCKED(rho)) {
    if(SYMVALUE(symbol) == R_UnboundValue)
        error(_("cannot add binding of '%s' to the base environment"),
          CHAR(PRINTNAME(symbol)));
    }
#ifdef USE_GLOBAL_CACHE
    R_FlushGlobalCache(symbol);
#endif
    SET_SYMBOL_BINDING_VALUE(symbol, value);
}

We can see the "value" accessible from parent environment is the new environment address, given by the GET_FREE_NODE on parent environment (I'm unsure I'm clear here, but I didn't find a correct phrasing).

So with the fact <- is defined as x <- value we're copying a pointer, we have multiples independent variables, all pointing to the same object.

Updating the object using any reference update the only object existing in memory.

SEXP stand for S-Expression according to various literrature and is mainly a pointer in C.

From comments,

R language defintion, thanks @BrodieG
A blog post about R's C interface
Thanks @Alexis_laz to have enforce me to get the real root cause in a more detailled manner.

Rich Scriven · Answer

new.env() is only being called once though, creating only one new environment. They all get the same environment because you chained all of the assignments to the same new.env() call. Therefore, when you assign to one you assign to them all.

a <- b <- c <- new.env()

a
# <environment: 0x49c1ed8>
b
# <environment: 0x49c1ed8>
c
# <environment: 0x49c1ed8>

If you want them to be separate environments, don't chain the assignment (i.e. use three separate calls to new.env()).

For completeness, bringing Tensibai's comment in -

this is a side effect of <- your line of code is the same as a <- new.env(); b <- a; c <- a (which more obvisouly does not call new.env() 3 times, but reference it to 3 variables names)

Assigning multiple environments

Tags:

r

Brandon Bertelsen

2 Answers

Tensibai

Rich Scriven

Recent Activity

Donate For Us

Assigning multiple environments

Tags:

r

Brandon Bertelsen

2 Answers

Tensibai

Rich Scriven

Related questions

Recent Activity

Donate For Us