Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

random forest error NA not permitted in predictors

Hi I am using the following r script to build a random forest:

# load the necessary libraries                      
library(randomForest)


testPP<-numeric()


# load the dataset
QdataTrain <- read.csv('train.csv',header = FALSE)
QdataTest <- read.csv('test.csv',header = FALSE)

QdataTrainX <- subset(QdataTrain,select=-V1)
QdataTrainY<-as.factor(QdataTrain$V1)   

QdataTestX <- subset(QdataTest,select=-V1)
QdataTestY<-as.factor(QdataTest$V1)
mdl <- randomForest(QdataTrainX, QdataTrainY) 

where I am getting the following error:

Error in randomForest.default(QdataTrainX, QdataTrainY) : 
  NA not permitted in predictors

however i see no occurence of NA in my data.

for reference here is my data:

https://docs.google.com/file/d/0B0iDswLYaZ0zUFFsT01BYlRZU0E/edit

does anyone know why this error is being thrown? I'll keep looking in the mean time. Thanks in advance for any help!

like image 959
brucezepplin Avatar asked Dec 03 '25 18:12

brucezepplin


1 Answers

The given data does contain some missing values (7 in particular):

sapply(QdataTrainX, function(x) sum(is.na(x)))

## V2  V3  V4  V5  V6  V7  V8  V9 V10 V11 V12 V13 V14 V15 V16 
## 0   0   0   0   0   0   0   0   0   0   0   0   0   0   0 
## V17 V18 V19 V20 V21 V22 V23 V24 V25 V26 V27 V28 V29 
## 0   0   0   0   0   0   1   1   1   1   1   1   1 

Therefore columns V23 to V29 have one missing value each

which(is.na(QdataTrainX$V23))

## 318

Gives the row number for that.

like image 82
Nishanth Avatar answered Dec 06 '25 08:12

Nishanth



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!