Can someone explain this behaviour to me?
a <- b <- c <- new.env()
a$this <- 1
b$this
# 1
c$this
# 1
I would have expected that a/b/c would be distinct environments much like a variable created in the same manner?
However, three environments show in the global environment, but any action on one is pushed into all of them.
Disclaimer: This answer may not be totally SFW as S-Expressions which is the common type for near all objects in R are abbreviated SEXP (Yep, S-EXPression, the hyphen is not where you thought). Now as SALT 'N' PEPA had sing: let's talk about SEXP !
TL;DR: An environment is stored in it's parent environment as a pointer, copying the variable to access it, just duplicate the pointer and still target the same object.
I did some digging on the root cause, main reason is what is an Environment, or in fact, how it is stored in it's parent environment. Let's see new.env
:
> new.env
function (hash = TRUE, parent = parent.frame(), size = 29L)
.Internal(new.env(hash, parent, size))
<bytecode: 0x0000000005972428>
<environment: namespace:base>
Ok, time to go to source code, in names.c
:
{"new.env", do_newenv, 0, 11, 3, {PP_FUNCALL, PREC_FN, 0}},
Searching do_newenv
bring us to builtin.c
which return (I took a shortcut here, but let's keep this not too long):
ans = NewEnvironment(R_NilValue, R_NilValue, enclos);
This NewEnvironment
is defined here in memory.c
and the comments above it give us a clue about what is going on:
Create an environment by extending "rho" with a frame obtained by
pairing the variable names given by the tags on "namelist" with the values given by the elements of "valuelist".
The code on itself is not so easy to follow:
SEXP NewEnvironment(SEXP namelist, SEXP valuelist, SEXP rho)
{
SEXP v, n, newrho;
if (FORCE_GC || NO_FREE_NODES()) {
PROTECT(namelist);
PROTECT(valuelist);
PROTECT(rho);
R_gc_internal(0);
UNPROTECT(3);
if (NO_FREE_NODES())
mem_err_cons();
}
GET_FREE_NODE(newrho);
newrho->sxpinfo = UnmarkedNodeTemplate.sxpinfo;
INIT_REFCNT(newrho);
TYPEOF(newrho) = ENVSXP;
FRAME(newrho) = valuelist;
ENCLOS(newrho) = CHK(rho);
HASHTAB(newrho) = R_NilValue;
ATTRIB(newrho) = R_NilValue;
v = CHK(valuelist);
n = CHK(namelist);
while (v != R_NilValue && n != R_NilValue) {
SET_TAG(v, TAG(n));
v = CDR(v);
n = CDR(n);
}
return (newrho);
}
Compared to a variable definition in global environment (exemple choosen for sanity of reader's mind) by gsetVar
:
void gsetVar(SEXP symbol, SEXP value, SEXP rho)
{
if (FRAME_IS_LOCKED(rho)) {
if(SYMVALUE(symbol) == R_UnboundValue)
error(_("cannot add binding of '%s' to the base environment"),
CHAR(PRINTNAME(symbol)));
}
#ifdef USE_GLOBAL_CACHE
R_FlushGlobalCache(symbol);
#endif
SET_SYMBOL_BINDING_VALUE(symbol, value);
}
We can see the "value" accessible from parent environment is the new environment address, given by the GET_FREE_NODE
on parent environment (I'm unsure I'm clear here, but I didn't find a correct phrasing).
So with the fact <-
is defined as x <- value
we're copying a pointer, we have multiples independent variables, all pointing to the same object.
Updating the object using any reference update the only object existing in memory.
SEXP
stand for S-Expression according to various literrature and is mainly a pointer in C.
From comments,
new.env()
is only being called once though, creating only one new environment. They all get the same environment because you chained all of the assignments to the same new.env()
call. Therefore, when you assign to one you assign to them all.
a <- b <- c <- new.env()
a
# <environment: 0x49c1ed8>
b
# <environment: 0x49c1ed8>
c
# <environment: 0x49c1ed8>
If you want them to be separate environments, don't chain the assignment (i.e. use three separate calls to new.env()
).
For completeness, bringing Tensibai's comment in -
this is a side effect of
<-
your line of code is the same asa <- new.env(); b <- a; c <- a
(which more obvisouly does not callnew.env()
3 times, but reference it to 3 variables names)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With