I want to merge two data.tables using a condition on a variable in dt1. See the following example:
library(data.table)
# Example data tables
dt1 <- data.table(id = 1:5, var1 = c(1, 0, 1, 0, 1), name1 = c("A", "B", "C", "D", "E"))
dt2 <- data.table(name2 = c("A", "B", "C", "D", "E"), other_var = 101:105)
> dt1
id var1 name1
1: 1 1 A
2: 2 0 B
3: 3 1 C
4: 4 0 D
5: 5 1 E
> dt2
name2 other_var
1: A 101
2: B 102
3: C 103
4: D 104
5: E 105
What I would like to do is merge dt1 and dt2 such that a value is merged only if var1 == 1 in dt1. Desired result:
> dt1
id var1 name1 other_var
1: 1 1 A 101
2: 2 0 B NA
3: 3 1 C 103
4: 4 0 D NA
5: 5 1 E 105
I've tried something like:
dt1[var1 == 1][dt2, other_var := i.other_var, on = .(name1 = name2)]
and
dt1[dt2, on = .(name1 = name2), other_var := i.other_var, var1 == 1]
but neither approach seems to work. What am I doing wrong?
You are expressing a condition by which you want to join, so set dt2[, var1 := 1] then join on var1 as well as well as name:
dt2[, var1 := 1]
dt1[dt2,
on = .(name1 = name2, var1),
other_var := i.other_var
][]
# id var1 name1 other_var
# <int> <num> <char> <int>
# 1: 1 1 A 101
# 2: 2 0 B NA
# 3: 3 1 C 103
# 4: 4 0 D NA
# 5: 5 1 E 105
This should be reasonably fast. If adding the column is a problem just remove it after the join (dt2[, var1 := NULL]).
You can try
dt1[dt2,
other_var := replace(other_var, !var1, NA),
on = .(name1 = name2)
]
or
dt1[dt2,
other_var := NA^(!var1) * other_var,
on = .(name1 = name2)
]
which gives
id var1 name1 other_var
1: 1 1 A 101
2: 2 0 B NA
3: 3 1 C 103
4: 4 0 D NA
5: 5 1 E 105
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With