Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the difference between Stata's standard deviations from predict and R's standard errors from predict?

As far as I can tell, both Stata and R have a "predict" function. I'm trying to replicate results that were performed in Stata using R, and the results involve calculating standard deviations of predicted values. Is there a functionality in R, maybe using its "predict" function, that will allow me to do this? I can't seem to replicate the results perfectly. In case it helps, the Stata code does the following:

reg Y X1 X2 if condition
predict resid, r
predict stdf, stdf

The definition of the stdf argument is:

stdf calculates the standard error of the forecast, which is the standard error of the point prediction for 1 observation. It is commonly referred to as the standard error of the future or forecast value. By construction, the standard errors produced by stdf are always larger than those produced by stdp; see Methods and formulas in [R] predict

And the R code I've been writing is:

fit <- lm(Y ~ X1 + X2, data=df)
new.df <- data.frame(...) # This is a new data frame with my new data period I want to predict in
predict(fit, new.df, se.fit = TRUE)

However, when I convert the standard errors to standard deviations, they don't match the Stata output.

Thanks in advance!

like image 853
brian Avatar asked Oct 15 '25 16:10

brian


2 Answers

Looks to me that you need:

 predict(fit, new.df, se.fit = TRUE, interval="prediction")

"Standard errors" apply to the confidence limits around the estimate of the mean, while prediction errors might easily be described as "standard deviations" around predictions.

> dfrm <- data.frame(a=rnorm(30), drop=FALSE)
> dfrm$y <- 4+dfrm$a*5+0.5*rnorm(30)
> plot( dfrm$a, predict(mod) )
> plot( dfrm$a, predict(mod, newdata=dfrm) )
> points( rep(seq(-2,2,by=0.1),2),   # need two copies for upper and lower
          c(predict(mod, newdata=list(a=seq(-2,2,by=0.1)), 
                         interval="prediction")[, c("lwr","upr")]), 
          col="red")
> points(dfrm$a, dfrm$y, col="blue" )

enter image description here

like image 198
IRTFM Avatar answered Oct 17 '25 06:10

IRTFM


Following @BondedDust's example: he shows how to get prediction intervals (+/- 1.96*std_dev). In principle you could recover the

 set.seed(1001)
 dfrm <- data.frame(a=rnorm(30), drop=FALSE)
 dfrm$y <- 4+dfrm$a*5+0.5*rnorm(30)

Fit model:

 mod <- lm(y ~ a, data=dfrm)

Predict:

 pframe <- data.frame(a=seq(-2,2,by=0.1))
 pred <- predict(mod,newdata=pframe,se.fit=TRUE)
 pframe$y <- pred$fit
 pframe$se <- pred$se.fit
 pframe$sd <- sqrt(pred$se.fit^2+sigma(mod)^2)

Results:

 head(pframe,3)
 ##      a         y        se        sd
 ## 1 -2.0 -5.877806 0.2498792 0.7319977
 ## 2 -1.9 -5.380531 0.2403656 0.7288049
 ## 3 -1.8 -4.883256 0.2309916 0.7257673

Check against the prediction interval:

 pred2 <- predict(mod,newdata=pframe,interval="predict")
 wid <- qt(0.975,df=pred$df)
 all.equal(unname(pred2[,"lwr"]),with(pframe,y-wid*sd))  ## TRUE
like image 31
Ben Bolker Avatar answered Oct 17 '25 07:10

Ben Bolker