Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In R, how can I elegantly compute the medians for multiple columns, and then count the number of cells in each row that exceed the median?

Tags:

r

Suppose I have the following data frame:

Base Coupled Derived Decl
   1       0       0    1
   1       7       0    1
   1       1       0    1
   2       3      12    1
   1       0       4    1

Here is the dput output:

temp <- structure(list(Base = c(1L, 1L, 1L, 2L, 1L), Coupled = c(0L,7L, 1L, 3L, 0L), Derived = c(0L, 0L, 0L, 12L, 4L), Decl = c(1L, 1L, 1L, 1L, 1L)), .Names = c("Base", "Coupled", "Derived", "Decl"), row.names = c(NA, 5L), class = "data.frame")

I want to compute the median for each column. Then, for each row, I want to count the number of cell values greater than the median for their respective columns and append this as a column called AboveMedians.

In the example, the medians would be c(1,1,0,1). The resulting table I want would be

Base Coupled Derived Decl AboveMedians
   1       0       0    1            0
   1       7       0    1            1
   1       1       0    1            0
   2       3      12    1            3
   1       0       4    1            1

What is the elegant R way to do this? I have something involving a for-loop and sapply, but this doesn't seem optimal.

Thanks.

like image 437
user2145843 Avatar asked Jan 25 '26 09:01

user2145843


2 Answers

We can use rowMedians from matrixStats after converting the data.frame to matrix.

library(matrixStats)
Medians <- colMedians(as.matrix(temp))
Medians
#[1] 1 1 0 1

Then, replicate the 'Medians' to make the dimensions equal to that of 'temp', do the comparison and get the rowSums on the logical matrix.

temp$AboveMedians <- rowSums(temp >Medians[col(temp)])
temp$AboveMedians
#[1] 0 1 0 3 1

Or a base R only option is

 apply(temp, 2, median)
 # Base Coupled Derived    Decl 
 #   1       1       0       1 

 rowSums(sweep(temp, 2, apply(temp, 2, median),  FUN = ">"))
like image 143
akrun Avatar answered Jan 26 '26 23:01

akrun


Another alternative:

library(dplyr)
library(purrr)

temp %>% 
  by_row(function(x) {
    sum(x > summarise_each(., funs(median))) }, 
    .to = "AboveMedian", 
    .collate = "cols"
    )

Which gives:

#Source: local data frame [5 x 5]
#
#   Base Coupled Derived  Decl AboveMedian
#  <int>   <int>   <int> <int>       <int>
#1     1       0       0     1           0
#2     1       7       0     1           1
#3     1       1       0     1           0
#4     2       3      12     1           3
#5     1       0       4     1           1
like image 29
Steven Beaupré Avatar answered Jan 26 '26 23:01

Steven Beaupré



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!