I am working with a very huge .csv dataset for an evaluation and yet I have got this error to resolve.
Warning in preProcess.default(data, method = c("center", "scale")) :
These variables have zero variances: num_outbound_cmds, is_host_login
Error in do_one(nmeth) : NA/NaN/Inf in foreign function call (arg 1)
What is the quickest way to exclude variables in my dataset whose variance is zero (0)?
The R package caret has a function nearZeroVar that does a pretty good job of identifying columns in a matrix or data frame that have zero or near zero variance. It returns the indices as a vector, which you can use to remove those columns.
> df <- data.frame(a=1:5, b=sample(1:5), c=rep(1,5))
> df
a b c
1 1 4 1
2 2 2 1
3 3 1 1
4 4 5 1
5 5 3 1
> nearZeroVar(df)
[1] 3
> df[,-nearZeroVar(df)]
a b
1 1 4
2 2 2
3 3 1
4 4 5
5 5 3
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With