Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get indices of top k values for each (selected) column in data.table

How to find the indices of the top k (say k=3) values for each column

> dt <- data.table( x = c(1, 1, 3, 1, 3, 1, 1), y = c(1, 2, 1, 2, 2, 1, 1) )
> dt
   x y
1: 1 1
2: 1 2
3: 3 1
4: 1 2
5: 3 2
6: 1 1
7: 1 1

Required output:

> output.1
   x y
1: 1 2
2: 3 4
3: 5 5

Or even better (notice the additional helpful descending sort in x):

> output.2
   var top1 top2 top3
1:   x    3    5    1
2:   y    2    4    5

Having the output would be already a great help.

like image 602
Amitai Avatar asked Oct 25 '25 05:10

Amitai


2 Answers

We can use sort (with index.return=TRUE) after looping over the columns of the dataset with lapply

dt[, lapply(.SD, function(x) sort(head(sort(x, 
          decreasing=TRUE, index.return=TRUE)$ix,3)))]
#   x y
#1: 1 2
#2: 3 4
#3: 5 5

Or use order

dt[, lapply(.SD, function(x) sort(head(order(-x),3)))]
like image 80
akrun Avatar answered Oct 26 '25 18:10

akrun


If the order of the elements having same rank doesn't matter then this answer would be also valid.
The order information can be extracted from data.table index.

library(data.table)
dt = data.table(x = c(1, 1, 3, 1, 3, 1, 1), y = c(1, 2, 1, 2, 2, 1, 1))
set2key(dt, x)
set2key(dt, y)

tail.index = function(dt, index, n){
    idx = attr(attr(dt, "index"), index)
    rev(tail(idx, n))
}

tail.index(dt, "__x", 3L)
#[1] 5 3 7
tail.index(dt, "__y", 3L)
#[1] 5 4 2
like image 24
jangorecki Avatar answered Oct 26 '25 19:10

jangorecki