Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

data.table handles the ordering of `.SD` and `.SDcols` with the `by` parameter

Tags:

r

data.table

I'm a newbie to data.table. I'm curious as to when the .SDcols parameter content was processed in the case below? As per the documentation, the value information should not be passed in .SD, and since I have only provided v1 data in .SDcols. So, theoretically it would report an error only? I'm not really understanding.

library(data.table)

dt <- data.table(
  group = c("A", "A", "B", "B", "B"),
  value = c(3, 6, 1, 2, 4),
  v1 = c(1,2,3,4,5)
)
dt[, .SD[value == min(value)], by = group, .SDcols = "v1"]
#>     group    v1
#>    <char> <num>
#> 1:      A     1
#> 2:      B     3

Created on 2025-06-25 with reprex v2.1.1

One way I would guess to handle this is:

  1. grouping is done based on by first
  2. did a row filter based on the information in .SD
  3. extracted the column data provided in .SDcols

Looking forward to the clarification, thanks!

like image 660
Breeze Avatar asked Dec 18 '25 15:12

Breeze


1 Answers

Let's see if we can dive into the process step by step

Content of .SD by group

dt[, by=group,.SD, .SDcols = "v1"]
    group    v1
   <char> <num>
1:      A     1
2:      A     2
3:      B     3
4:      B     4
5:      B     5

OK normal, lets add value now.

dt[, by=group, cbind(value, .SD), .SDcols = "v1"]
    group value    v1
   <char> <num> <num>
1:      A     3     1
2:      A     6     2
3:      B     1     3
4:      B     2     4
5:      B     4     5

Being able to do that means that columns are available as well as . SD in J scope. Let's add filter condition.

dt[, by=group, cbind(filter=value==min(value), .SD), .SDcols = "v1"]
    group filter    v1
   <char> <lgcl> <num>
1:      A   TRUE     1
2:      A  FALSE     2
3:      B   TRUE     3
4:      B  FALSE     4
5:      B  FALSE     5

Pretty easy to see what's going to happen now :-)

dt[, .SD[value == min(value)], by = group, .SDcols = "v1"]
    group    v1
   <char> <num>
1:      A     1
2:      B     3

So it's more

  1. grouping is done based on by first

  2. .SD is built from current group row subset keeping only .SDcols and "added" to it

like image 197
Billy34 Avatar answered Dec 21 '25 06:12

Billy34



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!