Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why can't I remove current observation using .I in data.table?

Tags:

r

data.table

Recently I saw a question (can't find the link) that was something like this

I want to add a column on a data.frame that computes the variance of a different column while removing the current observation.

dt = data.table(
  id = c(1:13),
  v = c(9,5,8,1,25,14,7,87,98,63,32,12,15)
)

So, with a for() loop:

res = NULL
for(i in 1:13){
  res[i] = var(dt[-i,v])
}

I tried doing this in data.table, using negative indexing with .I, but to my surprise none of the following works:

#1
dt[,var := var(dt[,v][-.I])]

#2
dt[,var := var(dt$v[-.I])]

#3 
fun = function(x){
  v = c(9,5,8,1,25,14,7,87,98,63,32,12,15)
  var(v[-x])
}
dt[,var := fun(.I)]

#4
fun = function(x){
  var(dt[-x,v])
}
dt[,var := fun(.I)]

All of those gives the same output:

    id  v var
 1:  1  9  NA
 2:  2  5  NA
 3:  3  8  NA
 4:  4  1  NA
 5:  5 25  NA
 6:  6 14  NA
 7:  7  7  NA
 8:  8 87  NA
 9:  9 98  NA
10: 10 63  NA
11: 11 32  NA
12: 12 12  NA
13: 13 15  NA

What am I missing? I thought it was a problem with .I being passed to functions, but a dummy example:

fun = function(x,c){
  x*c
}
dt[,dummy := fun(.I,2)]

    id  v var
 1:  1  9   2
 2:  2  5   4
 3:  3  8   6
 4:  4  1   8
 5:  5 25  10
 6:  6 14  12
 7:  7  7  14
 8:  8 87  16
 9:  9 98  18
10: 10 63  20
11: 11 32  22
12: 12 12  24
13: 13 15  26

works fine.

Why can't I use .I in this specific scenario?

like image 274
Fino Avatar asked Dec 29 '25 04:12

Fino


1 Answers

You may use .BY:

a list containing a length 1 vector for each item in by

dt[ , var_v := dt[id != .BY$id,  var(v)], by = id]

Variance is calculated once per row (by = id). In each calculation, the current row is excluded using id != .BY$id in the 'inner' i.

all.equal(dt$var_v, res)
# [1] TRUE

Why doesn't your code work? Because...

.I is an integer vector equal to seq_len(nrow(x)),

...your -.I not only removes current observation, it removes all rows in one go from 'v'.

A small illustration which starts with your attempt (just without the assignment :=) and simplifies it step by step:

# your attempt
dt[ , var(dt[, v][-.I])]
# [1] NA

# without the `var`, indexing only
dt[ , dt[ , v][-.I]]
# numeric(0)
# an empty vector

# same indexing written in a simpler way
dt[ , v[-.I]]
# numeric(0)

# even more simplified, with a vector of values
# and its corresponding indexes (equivalent to .I)
v <- as.numeric(11:14)
i <- 1:4
v[i]
# [1] 11 12 13 14

x[-i]
# numeric(0)
like image 160
Henrik Avatar answered Dec 31 '25 17:12

Henrik