Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Merging a sum by reference with data.table

Tags:

r

data.table

Let's say I have two data.table, dt_a and dt_b defined as below.

library(data.table)
set.seed(20201111L)

dt_a <- data.table(
  foo = c("a", "b", "c")
)

dt_b <- data.table(
  bar = sample(c("a", "b", "c"), 10L, replace=TRUE),
  value = runif(10L)
)

dt_b[]
##      bar     value
##   1:   c 0.4904536
##   2:   c 0.9067509
##   3:   b 0.1831664
##   4:   c 0.0203943
##   5:   c 0.8707686
##   6:   a 0.4224133
##   7:   a 0.6025349
##   8:   b 0.4916672
##   9:   a 0.4566726
##  10:   b 0.8841110

I want to left join dt_b on dt_a by reference, summing over the multiple match. A way to do so would be to first create a summary of dt_b (thus solving the multiple match issue) and merge if afterwards.

dt_b_summary <- dt_b[, .(value=sum(value)), bar]
dt_a[dt_b_summary, value_good:=value, on=c(foo="bar")]
dt_a[]
##     foo value_good
##  1:   a   1.481621
##  2:   b   1.558945
##  3:   c   2.288367

However, this will allow memory to the object dt_b_summary, which is inefficient.

I would like to have the same result by directly joining on dt_b and summing multiple match. I'm looking for something like below, but that won't work.

dt_a[dt_b, value_bad:=sum(value), on=c(foo="bar")]
dt_a[]
##     foo value_good value_bad
##  1:   a   1.481621  5.328933
##  2:   b   1.558945  5.328933
##  3:   c   2.288367  5.328933

Anyone knows if there is something possible?

like image 862
J.P. Le Cavalier Avatar asked Dec 10 '25 11:12

J.P. Le Cavalier


1 Answers

We can use .EACHI with by

library(data.table)
dt_b[dt_a, .(value = sum(value)), on = .(bar = foo), by = .EACHI]
#   bar    value
#1:   a 1.481621
#2:   b 1.558945
#3:   c 2.288367

If we want to update the original object 'dt_a'

dt_a[, value := dt_b[.SD,  sum(value), on = .(bar = foo), by = .EACHI]$V1]
dt_a
#   foo    value
#1:   a 1.481621
#2:   b 1.558945
#3:   c 2.288367

For multiple columns

dt_b$value1 <- dt_b$value
nm1 <- c('value', 'value1')
dt_a[, (nm1) := dt_b[.SD, lapply(.SD, sum), 
       on = .(bar = foo), by = .EACHI][, .SD, .SDcols = nm1]]
like image 180
akrun Avatar answered Dec 12 '25 01:12

akrun



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!