subset_base is a simple form of base r's subset() function. The example is taken from Advanced R Chapter 20.6.1.
This function for itself works fine:
subset_base <- function(data, rows) {
rows <- substitute(rows)
rows_val <- eval(rows, data, parent.frame())
data[rows_val, , drop = FALSE]
}
my_df <- data.frame(x = 1:3)
subset_base(my_df, x == 1)
#> x
#> 1 1
However, when we build a wrapper function apply_subset, define some value zzz in this wrapper function and call (in the same function) lapply() on subset_base supplying zzz as argument it can't be found.
I want to better understand why zzz can't be found. My mental model was the following: subset_base is called by lapply which is called by apply_subset. Within subset_base we evaluate the rows argument in the supplied data.frame data which is automatically converted to an environment. This environment is encapsulated by the calling environment parent.frame(). Which should be the execution environment of lapply. I thought that this environment has again the calling environment of lapply as its parent. This would be the execution environment of apply_subset. But this doesn't seem to be the case, since if this would be true, zzz should be found.
apply_subset <- function(){
zzz <- 2
dfs <- list(data.frame(x = 1:3), data.frame(x = 4:6))
lapply(dfs, FUN = subset_base, x == zzz)
}
apply_subset()
#> Error in eval(rows, data, parent.frame()): object 'zzz' not found
Even stranger, when we change to call to lapply from supplying subset_base directly as an object / function name, and instead wrap it into an anonymous function, zzz can be found.
apply_subset2 <- function(){
zzz <- 2
dfs <- list(data.frame(x = 1:3), data.frame(x = 4:6))
lapply(dfs, FUN = \(df) subset_base(df, x == zzz))
}
apply_subset2()
#> [[1]]
#> x
#> 2 2
#>
#> [[2]]
#> [1] x
#> <0 rows> (or 0-length row.names)
Created on 2024-01-27 with reprex v2.0.2
I would highly appreciate an explanation of (1) what the parent environments of each execution environment in the call stack are, (2) why my assumption that they align is, apparently, wrong, and (3) why this changes, when the call to lapply uses an anonymous function. Further it would be great to know, if (4) there is any way to make the call to lapply work without using an anonymous function (Advanced R seems to suggest that the (maybe) only solution is to use rlangs quosures).
The key point is that if we evaluate an expression with respect to the parent frame it will look through one level of the call stack and if not found will not look further through the call stack but will look through the environment in which the caller was defined and recursively through its ancestors.
With this background we can answer the questions:
(1) what the parent environments of each execution environment in the call stack are,
When apply_subset is run the function subset_base is passed to lapply and then the code inside lapply invokes the passed subset_base so the execution environment inside lapply is the parent frame to subset_base and not the execution environment inside apply_subset where zzz resides. zzz is not in the execution envionrment of lapply so the next place it looks is the parent environment of lapply where lapply was defined (not further up the call stack). In this case it looks in the base package but zzz is not there either and then it looks through the global environment and all packages on the search path but zzz is not there hence the error.
(2) why my assumption that they align is, apparently, wrong, and
See above.
(3) why this changes, when the call to lapply uses an anonymous function. Further it would be great to know, if
When the anonymous function invokes subset_base it looks for zzz in the anonymous function's execution environment but since it is not there it looks to the parent environment of the anonymous function (not further up the call stack) and since the anonymous function was defined in apply_subset the parent environment is the execution environment of apply_subset where zzz is found.
(4) there is any way to make the call to lapply work without using an anonymous function
The way to do this which generalizes to other situations is to pass around the environment by having an explicit envir= argument like this:
subset_base <- function(data, rows, envir = parent.frame()) { ##
rows <- substitute(rows)
rows_val <- eval(rows, data, envir) ##
data[rows_val, , drop = FALSE]
}
apply_subset <- function(){
zzz <- 2
dfs <- list(data.frame(x = 1:3), data.frame(x = 4:6))
lapply(dfs, FUN = subset_base, x == zzz, envir = environment()) ##
}
apply_subset()
giving
[[1]]
x
2 2
[[2]]
[1] x
<0 rows> (or 0-length row.names)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With