Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Calculate R-squared (%Var explained) from combined randomForest regression object

When calculating a randomForest regression, the object includes the R-squared as "% Var explained: ...".

library(randomForest)
library(doSNOW)
library(foreach)
library(ggplot2)

dat <- data.frame(ggplot2::diamonds[1:1000,1:7])
rf <- randomForest(formula = carat ~ ., data = dat, ntree = 500)
rf
# Call:
#   randomForest(formula = carat ~ ., data = dat, ntree = 500) 
#                Type of random forest: regression
#                      Number of trees: 500
# No. of variables tried at each split: 2
# 
# Mean of squared residuals: 0.001820046
# % Var explained: 95.22

However, when using a foreach loop to calculate and combine multiple randomForest objects, the R-squared values are not available, as it is noted in ?combine:

The confusion, err.rate, mse and rsq components (as well as the corresponding components in the test compnent, if exist) of the combined object will be NULL

cl <- makeCluster(8)
registerDoSNOW(cl)

rfPar <- foreach(ntree=rep(63,8), 
                 .combine = combine, 
                 .multicombine = T, 
                 .packages = "randomForest") %dopar% 
                 {
                   randomForest(formula = carat ~ ., data = dat, ntree = ntree)
                 }
stopCluster(cl)

rfPar
# Call:
#   randomForest(formula = carat ~ ., data = dat, ntree = ntree) 
#                Type of random forest: regression
#                      Number of trees: 504
# No. of variables tried at each split: 2

Since it was not really answered in this question: Is it at all possible to calculate the R-squared (% Var explained) and Mean of squared residuals from an randomForest object afterwards?

(Critics of this parallelization might argue to use caret::train(... method = "parRF"), or others. However, this turns out to take forever. In fact, this might be useful for anybody who uses combine to merge randomForest objects...)

like image 720
loki Avatar asked Nov 15 '25 09:11

loki


1 Answers

Yes. You can calculate the R-squared value after the fact by taking the predictions that result from your training data and your trained model and comparing them to the actual values:

# taking the object from the question:
actual <- dat$carat
predicted <- unname(predict(rfPar, dat))

R2 <- 1 - (sum((actual-predicted)^2)/sum((actual-mean(actual))^2))

Or Mean Squared Error:

caret::RMSE(predicted,actual)
like image 182
Ian Wesley Avatar answered Nov 17 '25 08:11

Ian Wesley



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!