I'm using the regression model of random forest in R and I found the parameter corr.bias which according to the manual is "experimental", my data is nonlinear and I just wonder if setting this parameter to true can enhance the results, plus I don't know exactly how it works for nonlinear data, so I really appreciate if someone can explain to me how this correction bias works in the random forest package and if it can enhance my regression model or not.
The short answer is that it performs a simple correction based on a linear regression on the actual and fitted values.
From regrf.c:
/* Do simple linear regression of y on yhat for bias correction. */
if (*biasCorr) simpleLinReg(nsample, yptr, y, coef, &errb, nout);
and the first few lines of that function are simply:
void simpleLinReg(int nsample, double *x, double *y, double *coef,
double *mse, int *hasPred) {
/* Compute simple linear regression of y on x, returning the coefficients,
the average squared residual, and the predicted values (overwriting y). */
So when you fit a regression random forest with corr.bias = TRUE the model object returned will contain a coef element which will simply be the two coefficients from the linear regression.
Then when you call predict.randomForest this happens:
## Apply bias correction if needed.
yhat <- rep(NA, length(rn))
names(yhat) <- rn
if (!is.null(object$coefs)) {
yhat[keep] <- object$coefs[1] + object$coefs[2] * ans$ypred
}
The non-linear nature of your data probably isn't necessarily relevant, but the bias correction may be very poor if the relationship between the fitted and actual values is very far from linear.
You can always fit the model and then plot the fitted vs actual values yourself and see whether a correction based on a linear regression would help or not.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With