Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using order(colSums()) in R

Tags:

sorting

r

na

I have a data frame matrix in R that I wish to order by the sum of columns in a decreasing order. My data varies from values of +1 to -1. I have this code that does this pretty perfectly:

DF<-DF[, order(colSums(-DF))]

However, I do have some NA values spread out amongst the data (no single column or row is all NA so I cannot simply remove an entire column or row). I believe that the data is not being sorted properly, as columns that contain NAs are not sorted, and just placed behind the sorted columns.

Is there a way to order the data by sum of columns as above, but also allowing the sorting of columns with NAs as well?

like image 654
Ryan Rothman Avatar asked Dec 06 '25 14:12

Ryan Rothman


2 Answers

If I understand you correctly, you want to sort "NA columns" behind "non-NA columns", but then you also want to sort the NA columns amongst themselves based on the result of colSums() applied to the non-NA cells within the NA columns. You can do this with an additional argument to order() to break ties in which you call colSums() with the additional argument na.rm=TRUE. Here's a demo with 4 columns total, 2 with NAs, 2 without:

set.seed(3L)
df <- setNames(rev(as.data.frame(replicate(4L,
     sample(c(seq(-1,1,0.5),NA),
            5L,rep=TRUE)))),letters[1:4])
df ## columns a and b are "NA columns", columns c and d are "non-NA columns"
##      a   b    c    d
## 1  1.0 0.5  0.5 -0.5
## 2 -1.0 0.5 -1.0  1.0
## 3  1.0 0.5 -0.5  0.0
## 4   NA 0.5  0.5 -0.5
## 5 -0.5  NA  0.5  0.5
colSums(-df) ## d should be moved before c, but can't tell yet about a and b
##    a    b    c    d
##   NA   NA  0.0 -0.5
colSums(-df,na.rm=TRUE) ## this can tiebreak a and b; b should be moved before a
##    a    b    c    d
## -0.5 -2.0  0.0 -0.5
df[,order(colSums(-df))] ## fails to order NA columns
##      d    c    a   b
## 1 -0.5  0.5  1.0 0.5
## 2  1.0 -1.0 -1.0 0.5
## 3  0.0 -0.5  1.0 0.5
## 4 -0.5  0.5   NA 0.5
## 5  0.5  0.5 -0.5  NA
df[,order(colSums(-df),colSums(-df,na.rm=TRUE))] ## tiebreaker orders NA columns properly
##      d    c   b    a
## 1 -0.5  0.5 0.5  1.0
## 2  1.0 -1.0 0.5 -1.0
## 3  0.0 -0.5 0.5  1.0
## 4 -0.5  0.5 0.5   NA
## 5  0.5  0.5  NA -0.5

Sorry, I misunderstood. Looks like this is what you're looking for:

df[,order(colSums(-df,na.rm=TRUE))]
##     b    a    d    c
## 1 0.5  1.0 -0.5  0.5
## 2 0.5 -1.0  1.0 -1.0
## 3 0.5  1.0  0.0 -0.5
## 4 0.5   NA -0.5  0.5
## 5  NA -0.5  0.5  0.5

Note that passing na.rm=TRUE is equivalent to treating NAs as zero, contrary to your proviso that regarding NAs as zero would mess up the sorting.

like image 109
bgoldst Avatar answered Dec 08 '25 05:12

bgoldst


To allow for NA columns to be sorted equally with non-NA columns, use the "na.rm=TRUE" argument in the "colSums" function. This will override the original ordering of colSums where the NA columns are left unsorted behind the sorted columns. The final code is:

DF<-DF[, order(colSums(-DF, na.rm=T))]
like image 43
Ryan Rothman Avatar answered Dec 08 '25 04:12

Ryan Rothman



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!