Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Error with knnImputer from the DMwR Package: invalid 'times' argument

Tags:

r

I'm trying to run knnImputer from the DMwR package on a genomic dataset. The dataset has two columns - one for location on a chromosome (numeric, an integer) and one for methylation values (also numeric, double), with many of the methylation values are missing. The idea is that distance should be based on location in the chromosome. I also have several other features, but chose to not include those). When I run the following line however, I get an error.

reg.knn <- knnImputation(as.matrix(testp), k=2, meth="median")
#ERROR:
#Error in rep(1, ncol(dist)) : nvalid 'times' argument

Any thoughts on what could be causing this? If this doesn't work, does anyone know of anything other good KNN Imputers in R packages? I've been trying several but each returns some kind of error.

like image 709
Marti Avatar asked Oct 16 '25 03:10

Marti


2 Answers

I got a similar error today:

Error in rep(1, ncol(dist)) : invalid 'times' argument

I could not find a solution online but with some trail and error , I think the issue is with no. of columns in data frame

Try passing at least '3' columns and do KNNimputation

I created a dummy column which gives ROW count of the observation (as third column).

It worked for me !


Examples for your reference:

Example 1 -

temp <- data.frame(X = c(1,2,3,4,5,6,7,8,9,10), Y = c(T, T, F, F,F,F,NA,NA,T,T))
temp7<-NULL temp7 <-knnImputation(temp,scale=T,k=3, meth='median', distData = NULL)

Error in rep(1, ncol(dist)) : invalid 'times' argument

Example 2 -

temp <- data.frame(X = 1:10, Y = c(T, T, F, F,F,F,NA,T,T,T), Z = c(NA,NA,7,8,9,5,11,9,9,4)) 
temp7<-NULL temp7 <-knnImputation(temp,scale=T,k=3, meth='median', distData = NULL) 

Here number of columns passed is 3. Did NOT get any error!

like image 99
Bhavani Avatar answered Oct 18 '25 20:10

Bhavani


Today, I encountered the same error. My df was much larger than 3 columns, so this seems to be not the (only?) problem.

I found that rows with too much NAs caused the problem (in my case, more than 95% of a given row was NA). Filtering out this row solved the problem.

Take home message: do not only filter for NAs over the columns (which I did), but also check the rows (it's of course impossible to impute by kNN if you cannot define what exactly is a nearest neighbor).

Would be nice if the package would provide a readable error message!

like image 35
MartijnM Avatar answered Oct 18 '25 20:10

MartijnM



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!