Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R Recover original data.frame from model.frame

In R, you can fit GAM models from the mgcv package using a formula which contains transformations such as log or sqrt and by default the model.frame is returned (only the variables specified in the formula with transformations applied).

Is there any way I can recover the untransformed data.frame?

Example:

reg <- mgcv::gam(log(mpg) ~ disp + I(hp^2), data=mtcars)

returns

> head(reg$model,3) log(mpg) disp I(hp^2) Mazda RX4 3.044522 160 12100 Mazda RX4 Wag 3.044522 160 12100 Datsun 710 3.126761 108 8649

But, I want to get this untransformed dataset from the model's model.frame

mpg disp hp Mazda RX4 21.0 160 110 Mazda RX4 Wag 21.0 160 110 Datsun 710 22.8 108 93

Some Background: The newdata argument for most model's predict() function requires untransformed data, so I cannot feed the model.frame back into the predict() function. I am already aware that the omitting the newdata argument will return fitted values. My requirement is that the model object gives me back the raw data.

like image 762
Steven M. Mortimer Avatar asked Sep 17 '25 11:09

Steven M. Mortimer


2 Answers

Here is one way: use glm instead of lm, even for Gaussian data. glm returns much more stuff than lm, including the raw data frame.


Well, if you are asking mgcv questions, you'd better provide a mgcv example.

mgcv has a consistent standard with glm. Have a read on ?gamObject for a full list of what gam can return. You will see that it can return data, if you set keepData via control argument of gam. When you call gam, add the following

control = gam.control(keepData = TRUE)

Here is a simple, reproducible example:

dat <- data.frame(x = runif(50), y = rnorm(50))
library(mgcv)
fit <- gam(y ~ s(x, bs = 'cr', k = 5), data = dat, control = gam.control(keepData = TRUE))
head(fit$model)  # model frame
head(fit$data)  # original data
like image 156
Zheyuan Li Avatar answered Sep 20 '25 02:09

Zheyuan Li


We can extract the vars from the 'terms' and use it to subset the original dataset

head(mtcars[all.vars(reg$terms)], 3)
#               mpg disp  hp
#Mazda RX4     21.0  160 110
#Mazda RX4 Wag 21.0  160 110
#Datsun 710    22.8  108  93

Or with call

v1 <- all.vars(reg$call)
head(get(tail(v1, 1))[head(v1, -1)], 3)
#               mpg disp  hp
#Mazda RX4     21.0  160 110
#Mazda RX4 Wag 21.0  160 110
#Datsun 710    22.8  108  93
like image 32
akrun Avatar answered Sep 20 '25 01:09

akrun