As far as I can tell, both Stata and R have a "predict" function. I'm trying to replicate results that were performed in Stata using R, and the results involve calculating standard deviations of predicted values. Is there a functionality in R, maybe using its "predict" function, that will allow me to do this? I can't seem to replicate the results perfectly. In case it helps, the Stata code does the following:
reg Y X1 X2 if condition
predict resid, r
predict stdf, stdf
The definition of the stdf
argument is:
stdf
calculates the standard error of the forecast, which is the standard error of the point prediction for 1 observation. It is commonly referred to as the standard error of the future or forecast value. By construction, the standard errors produced bystdf
are always larger than those produced bystdp
; see Methods and formulas in [R] predict
And the R code I've been writing is:
fit <- lm(Y ~ X1 + X2, data=df)
new.df <- data.frame(...) # This is a new data frame with my new data period I want to predict in
predict(fit, new.df, se.fit = TRUE)
However, when I convert the standard errors to standard deviations, they don't match the Stata output.
Thanks in advance!
Looks to me that you need:
predict(fit, new.df, se.fit = TRUE, interval="prediction")
"Standard errors" apply to the confidence limits around the estimate of the mean, while prediction errors might easily be described as "standard deviations" around predictions.
> dfrm <- data.frame(a=rnorm(30), drop=FALSE)
> dfrm$y <- 4+dfrm$a*5+0.5*rnorm(30)
> plot( dfrm$a, predict(mod) )
> plot( dfrm$a, predict(mod, newdata=dfrm) )
> points( rep(seq(-2,2,by=0.1),2), # need two copies for upper and lower
c(predict(mod, newdata=list(a=seq(-2,2,by=0.1)),
interval="prediction")[, c("lwr","upr")]),
col="red")
> points(dfrm$a, dfrm$y, col="blue" )
Following @BondedDust's example: he shows how to get prediction intervals (+/- 1.96*std_dev). In principle you could recover the
set.seed(1001)
dfrm <- data.frame(a=rnorm(30), drop=FALSE)
dfrm$y <- 4+dfrm$a*5+0.5*rnorm(30)
Fit model:
mod <- lm(y ~ a, data=dfrm)
Predict:
pframe <- data.frame(a=seq(-2,2,by=0.1))
pred <- predict(mod,newdata=pframe,se.fit=TRUE)
pframe$y <- pred$fit
pframe$se <- pred$se.fit
pframe$sd <- sqrt(pred$se.fit^2+sigma(mod)^2)
Results:
head(pframe,3)
## a y se sd
## 1 -2.0 -5.877806 0.2498792 0.7319977
## 2 -1.9 -5.380531 0.2403656 0.7288049
## 3 -1.8 -4.883256 0.2309916 0.7257673
Check against the prediction interval:
pred2 <- predict(mod,newdata=pframe,interval="predict")
wid <- qt(0.975,df=pred$df)
all.equal(unname(pred2[,"lwr"]),with(pframe,y-wid*sd)) ## TRUE
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With