Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Select columns which has a specific value in data.table

Tags:

r

data.table

Minimal example:

dt <- data.table(a=c(1,2,3),b=c(4,5,6))

That looks like that:

>  dt
   a b
1: 1 4
2: 2 5
3: 3 6

Suppose I want to index the column where there is a 6 value, in this toy example it's easy since we know the column:

> dt[,.(b)]
   b
1: 4
2: 5
3: 6

Now what if this dt had several thousand columns and we wouldn't know where the 6 lies.

I tried this:

> dt[,.SD==6]
         a     b
[1,] FALSE FALSE
[2,] FALSE FALSE
[3,] FALSE  TRUE

and this:

> dt[,lapply(.SD,`==`,6)]
       a     b
1: FALSE FALSE
2: FALSE FALSE
3: FALSE  TRUE

and also that:

> dt[,lapply(.SD,function(x) any(x==6))]
       a    b
1: FALSE TRUE

But i can't get the original column back:

   b
1: 4
2: 5
3: 6
like image 680
moth Avatar asked Oct 24 '25 03:10

moth


2 Answers

Hopefully there is a more elegant solution, but in the meantime:

dt[,sapply(dt, function(x) any(x == 6)), with=F]
   b
1: 4
2: 5
3: 6

Here's a quick benchmark, since data.table is often used for speed:

enter image description here

n=1000000
dt = data.table(V1 = round(runif(n) * 100), V2 = round(runif(n) * 100) ,V3 = round(runif(n) * 100), V4 = round(runif(n) * 100), V5 = round(runif(n) * 100), V6 = round(runif(n) * 100))

bench = microbenchmark::microbenchmark(
    user438383 = dt[,sapply(dt, function(x) any(x == 6)), with=F],
    Wimpel = dt[, colSums(dt == 6) > 0, with = FALSE],
    times = 10000
    )
like image 198
user438383 Avatar answered Oct 25 '25 16:10

user438383


dt[, colSums(dt == 6) > 0, with = FALSE]
#    b
# 1: 4
# 2: 5
# 3: 6
like image 40
Wimpel Avatar answered Oct 25 '25 18:10

Wimpel