Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why we need to load the data in datasets(or any other) package?

It seems even I didn't load the mtcars using

data(mtcars)

I could deal with that dataframe, and I see it not in global env but in its package env as a promise.

# can run without loading with data(mtcars)
View(mtcars)
ncol(mtcars)

I read someone's code found they using data() to load it. But why do we need to load it into memory? What we can't do without explicitly load it through data()?

Because the datasets is in the search() path, that means in lexical scoping you can access to the promise, too.

like image 985
cloudscomputes Avatar asked Nov 30 '25 16:11

cloudscomputes


1 Answers

This entirely depends on whether the package defining the dataset uses lazy data or not. These days, the vast majority of packages use lazy data, so calling utils::data() is not required.

To quote chapter 8 from R packages:

It is important to note that lazily-loaded datasets do not need to be pre-loaded with utils::data() and, in fact, it’s usually best to avoid doing so. Above, once we did library(nycflights13), we could immediately access flights. There is no call to data(flights), because it is not necessary.

There are specific downsides to data(some_pkg_data) calls that support a policy of only using data() when it is actually necessary, i.e. for datasets that would not be available otherwise:

  • By default, data(some_pkg_data), creates one or more objects in the user’s global workspace. There is the potential to silently overwrite pre-existing objects with new values.
  • There is also no guarantee that data(foo) will create exactly one object named “foo”. It could create more than one object and/or objects with totally different names.
like image 115
Konrad Rudolph Avatar answered Dec 02 '25 05:12

Konrad Rudolph