I'm looking to join the persons data frame with the colour_match data frame, so that the person data frame will have an additional column "favourite_colour".
first_name <- c("Nick", "Emily", "Joe", "Kate", "Kate", "Nick", "Joe")
last_name <- c("Smith", "Doe", "Lee", "Jackson", "Deng", "Smith", "Lee")
age <- c(28, 30, 52, 18, 42, 28, 52)
pet <- c("Cat", "Dog", "Rabbit","Cat", "Mouse", "Snake", "Bird")
person_pet <- data.frame(first_name, last_name, age, pet)

first_name <- c("Emily","Nick", "Kate", "Joe", "Kate")
last_name <- c("Doe", "Smith", NA, "Lee", NA)
age <- c(NA, NA, 42, 52, 29)
favourite_colour <- c("black", "yellow", "red", "green", "white")
colour_match <- data.frame(first_name, last_name, age, favorite_colour)

The challenge is that I will need to join by as many columns as possible as long as the column has a value in the colour_match data frame. The available columns in the colour_match table determines the "join columns" between the two tables. For example, for "Nick Smith", join by first_name and last_name; for "Joe Lee", join by all three columns; . In the end, the result should look like below:

Appreciate any help!
I would create a small helper function which checks the availability of possible join columns and then joins by all available ones. This function then gets called for each row separately and the result is concatenated again.
library(dplyr, warn.conflicts = F)
library(tidyr)
first_name <- c("Nick", "Emily", "Joe", "Kate", "Kate")
last_name <- c("Smith", "Doe", "Lee", "Jackson", "Deng")
age <- c(28, 30, 52, 18, 42)
persons <- tibble(first_name, last_name, age)
first_name <- c("Nick", "Emily", "Joe", "Kate", "Kate")
last_name <- c("Smith", NA, NA, "Jackson", NA)
age <- c(NA, NA, 52, 18, 29)
favorite_colour <- c("black", "yellow", "red", "green", "white")
colour_match <- tibble(first_name, last_name, age, favorite_colour)
## ---------------------------------------------------------------------------
# helper function
join_if_available <- function(x, y) {
# define possible set of join cols
join_cols <- c("first_name", "last_name", "age")
# check which (if any) col is missing
missing <- which(is.na(y[join_cols]))
if (length(missing) > 0) {
# remove missing index and then only keep the available join cols
# plus the favorite_colour col
join_cols <- join_cols[-missing]
y <- dplyr::select(y, join_cols, favorite_colour)
}
# perform a simple left join
out <- dplyr::left_join(x, y, by = join_cols)
return(out)
}
# split each data frame by row, iterate over each row separately,
# then rowbind the result
purrr::map2_dfr(.x = split(persons, 1:nrow(persons)),
.y = split(colour_match, 1:nrow(colour_match)),
.f = ~ join_if_available(x = .x, y = .y))
#> # A tibble: 5 × 4
#> first_name last_name age favorite_colour
#> <chr> <chr> <dbl> <chr>
#> 1 Nick Smith 28 black
#> 2 Emily Doe 30 yellow
#> 3 Joe Lee 52 red
#> 4 Kate Jackson 18 green
#> 5 Kate Deng 42 <NA>
Created on 2023-09-15 by the reprex package (v2.0.1)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With