Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R for loop over randomForest

I have an R dataframe with 9 input variables and 1 output variable. I want to find the accuracy of randomForest using each individual input, and add them to a list. To do this, I need to loop over a list of formulas, as in the code below:

library(randomForest)
library(caret)
formulas = c(target ~ age, target ~ sex, target ~ cp, 
             target ~ trestbps, target ~ chol, target ~ fbs, 
             target ~ restecg, target ~ ca, target ~ thal)

test_idx = sample(dim(df)[1], 60)
test_data = df[test_idx, ]
train_data = df[-test_idx, ]

accuracies = rep(NA, 9)

for (i in 1:length(formulas)){
  rf_model = randomForest(formulas[i], data=train_data)
  prediction = predict(rf_model, newdata=test_data, type="response")
  acc = confusionMatrix(test_data$target, prediction)$overall[1]
  accuracies[i] = acc
}

I run into an error,

Error in if (n==0) stop("data (x) has 0 rows") : argument is of length zero calls: ... eval -> eval -> randomForest -> randomForest.default Execution halted

The error is related to the formulas[i] argument passed to randomForest, when I type the formula name as the argument (for example, rf_model = randomForest(target ~ age, data=train_data), there is no error.

Is there any other way to iterate over randomForest?

Thank you!

like image 279
Mridula Gunturi Avatar asked Oct 15 '25 07:10

Mridula Gunturi


1 Answers

As you have not provided any data, I am using the iris dataset. You have to make 2 changes in your code to make it run. First, use list to store the formulas, and second, formulas[[i]] within for loop. You can use the following code

library(randomForest)
library(caret)

df <- iris
formulas = list(Species ~ Sepal.Length, Species ~ Petal.Length, Species ~ Petal.Width, 
                Species ~ Sepal.Width)

test_idx = sample(dim(df)[1], 60)
test_data = df[test_idx, ]
train_data = df[-test_idx, ]

accuracies = rep(NA, 4)

for (i in 1:length(formulas)){
  rf_model = randomForest(formulas[[i]], data=train_data)
  prediction = predict(rf_model, newdata=test_data, type="response")
  acc = confusionMatrix(test_data$Species, prediction)$overall[1]
  accuracies[i] = acc
}

#> 0.7000000 0.9166667 0.9166667 0.5000000
like image 173
Bappa Das Avatar answered Oct 18 '25 05:10

Bappa Das



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!