Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

missing values in object - Random Forest Confusion Matrix in R

I'm trying to obtain the confusion matrix after a fitting a model with no success. Using the same code and decision tree, instead, there was no problem. That's my code:

library(caret)
library(randomForest)

training <- read.csv("https://d396qusza40orc.cloudfront.net/predmachlearn/pml-training.csv", na.strings=c("#DIV/0!"), row.names = 1)

to_exclude <- nearZeroVar(training)
training <- training[, -to_exclude]

set.seed(1234)
train_idx <- createDataPartition(training$classe, p = 0.8, list = FALSE)
train <- training[train_idx,]
validation <- training[-train_idx,]

rf_model <- randomForest(classe ~ . , data=train, method="class")
rf_validation <- predict(rf_model, validation, type="class")

confusionMatrix(rf_validation, validation$classe)

That's the error:

Error in na.fail.default(list(classe = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, : missing values in object

I also try this:

table(rf_validation, validation$classe)

Which resulted in the same error. If I use:

dt_model <- rpart(classe ~ ., data=train, method="class")

Instead, everything works fine.

What am I missing?

like image 345
pceccon Avatar asked Oct 28 '25 05:10

pceccon


1 Answers

As mentioned by @lukeA, I was having problem due to NA values. Another option that worked for me was to clean my data a little bit more.:

training <- training[, colSums(is.na(training)) == 0]

Removing features formed by NA values.

like image 82
pceccon Avatar answered Oct 31 '25 12:10

pceccon



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!