I have a training_predictors set with 56 columns, all of which are numeric. training_labels is a factor vector of 0 and 1.
I am using following list as subset sizes to be tested.
subset_sizes <- c(1:5, 10, 15, 20, 25)
Following is the list of modified rfFuncs functions.
rfRFE <- list(summary = defaultSummary, 
              fit = function(x, y, first, last, ...) {
                  library(randomForest)
                  randomForest(x, y, importance = first, ...)
              }, 
              pred = function(object, x) predict(object, x), 
              rank = function(object, x, y) {
                  vimp <- varImp(object)
                  vimp <- vimp[order(vimp$Overall, decreasing = TRUE),,drop = FALSE]
                  vimp$var <- rownames(vimp)
                  vimp
              }, 
              selectSize = pickSizeBest, 
              selectVar = pickVars)
I have declared the control function as:
rfeCtrl <- rfeControl(functions = rfRFE, 
                      method = "cv", 
                      number = 10, 
                      verbose = TRUE)
But when I run rfe function as shown below,
rfProfile <- rfe(training_predictors, 
                 training_labels, 
                 sizes = subset_sizes, 
                 rfeControl = rfeCtrl)
I am getting an error as :
Error in { : task 1 failed - "argument 1 is not a vector"
I also tried changing the vector subset_sizes, but still no luck. What am I doing wrong?
Update : I tried to run these steps one by one and the problem seems to be with the rank function. But I am still unable to figure out the problem.
Update: I found out the problem. varImp in rank function is not containing $Overall. But it contains columns with names 0 and 1. Why is it so? What does 0 and 1 signify (both column values are exactly same, by the way)? Also, how can I make varImp to return $Overall column? [as a temporary solution, I am creating a new column $Overall and attaching it to vimp in rank function.]
Using 0 and 1 as factor levels is problematic since those are not valid R column names. In your other SO post you probably would have received a message about using these as factor levels for your output. 
Try using a factor outcome with some more informative levels that can be translated into valid R column names (for class probabilities).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With