Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

data.table: how to filter lines with %in% when not all variables are inside the group?

Tags:

r

data.table

This may be a simple question and I am missing something. But It is bugging me. Assume this example data.table:

library(data.table)
test <- data.table(group1 = "a", group2 = "z", value  = 1)

Why doesn't this work

test[group1 %in% c("a", "b"), sum(value), group2]
Erro em `[.data.table`(test, group1 %in% c("a", "b"), sum(value), group2) : 
  i[2] is 0. While grouping, i=0 is allowed when it's the only value. When length(i) > 1, all i should be > 0.

But this does:

test[group1 %in% c("a", "b"), ][,sum(value), group2]
   group2 V1
1:      z  1

Is this really the expected behavior?

like image 201
Carlos Cinelli Avatar asked Feb 01 '26 16:02

Carlos Cinelli


2 Answers

Update: This behaviour is fixed in the development version, 1.9.5 and works as expected now.


Another workaround is to use data.table built in %chin% function

test[group1 %chin% c("a", "b"), sum(value), group2]
#    group2 V1
# 1:      z  1
like image 62
David Arenburg Avatar answered Feb 04 '26 05:02

David Arenburg


This looks like a bug to me, but you can get your expected behavior with extra parentheses around the i:

test[(group1 %in% c("a", "b")),sum(value), group2]
like image 39
Frank Avatar answered Feb 04 '26 05:02

Frank



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!