Suppose I have two data frames like the following:
df1 <- data.frame(
id1 = c(1:4, 4),
id2 = letters[1:5],
val1 = c(0, 1, pi, exp(1), 42)
)
df2 <- data.frame(
id1 = c(1:4, 4),
id2 = c(NA, letters[2:5]),
val2 = c("Lorem", "ipsum", "dolor", "sit", "amet")
)
## df1 df2
## id1 id2 val1 id1 id2 val2
## 1 1 a 0.000000 1 <NA> Lorem
## 2 2 b 1.000000 2 b ipsum
## 3 3 c 3.141593 3 c dolor
## 4 4 d 2.718282 4 d sit
## 5 4 e 42.000000 4 e amet
This would be my desired result:
desired_result <- data.frame(
id1 = c(1:4, 4),
id2 = letters[1:5],
val1 = c(0, 1, pi, exp(1), 42),
val2 = c("Lorem", "ipsum", "dolor", "sit", "amet")
)
## id1 id2 val1 val2
## 1 1 a 0.000000 Lorem
## 2 2 b 1.000000 ipsum
## 3 3 c 3.141593 dolor
## 4 4 d 2.718282 sit
## 5 4 e 42.000000 amet
In my desired result, I'd like to use the information in column id2, when available,
to resolve join ambiguities raised by duplicate values in id1.
For example, rows 4 and 5 have the same id1, but we can differentiate them by id2.
So, if I try to join just on id1, I get too many observations,
because I'm not utilizing this information in id2 to match records:
library(dplyr)
left_join(df1, df2, by = "id1")
## id1 id2.x val1 id2.y val2
## 1 1 a 0.000000 <NA> Lorem
## 2 2 b 1.000000 b ipsum
## 3 3 c 3.141593 c dolor
## 4 4 d 2.718282 d sit
## 5 4 d 2.718282 e amet
## 6 4 e 42.000000 d sit
## 7 4 e 42.000000 e amet
However, if I try to join on both IDs, I lose the information in val2 for row 1,
because df2 has NA for id2 on row 1:
left_join(df1, df2, by = c("id1", "id2"))
## id1 id2 val1 val2
## 1 1 a 0.000000 <NA>
## 2 2 b 1.000000 ipsum
## 3 3 c 3.141593 dolor
## 4 4 d 2.718282 sit
## 5 4 e 42.000000 amet
How can I left_join() (or equivalent) to achieve my desired_result?
An option using data.table:
library(data.table)
setDT(df1)
setDT(df2)
df1[df2, on=.(id1, id2), mult="first", val2 := val2]
df1[is.na(val2), val2 :=
df2[.SD, on=.(id1), mult="first", val2]]
I have taken the liberty of using the first value if there are multiple joins (i.e. the combination of id1 and id2 in df2 are not unique).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With