Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does predict function in caret package use future information when preprocessing?

My question is pretty simple but I can't find a clear cut answer using caret package doc. If I use the preprocessing options center and scale in my train function, it is stated that the same preprocesing will be applied to new data set while doing predictions.

So when I use the predict function: Does it mean that mean and scale of the training set is applied to the new data? Or a new centering and scaling is applied to the new data set, thus potentially using points in the future if the data are timeseries (which is problematic)?

Thank you

like image 747
mlal Avatar asked Nov 24 '25 17:11

mlal


1 Answers

caret::predict.train uses parameters from the model you built to predict on the test set.

Here is a snippet from the source code that shows the preProc data comes from the object's preProcess parameters:

out <- predictionFunction(method = object$modelInfo, 
            modelFit = object$finalModel, newdata = newdata, 
            preProc = object$preProcess)

You can see these parameters for yourself after creating your model by accessing object$preProcess. Here is a complete example:

rm(list=ls())
library(caret)
set.seed(4444)

data(mtcars)
inTrain <- createDataPartition(y=mtcars$mpg,p=0.75,list=FALSE)
training <- mtcars[inTrain,]
testing <- mtcars[-inTrain,]

lmFit <- train(mpg~.,data=training,method="lm",preProc=c("center","scale"))
lmFit$preProcess
like image 69
ddunn801 Avatar answered Nov 27 '25 10:11

ddunn801



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!