I have two dataframes with several columns, but for simplicity sake lets say my two dataframes are as such:
## Dataframe #1
df1 <- data.frame(name = c("Jake", "Paul", "Luis", "Leon"),
salary_new = c(60, 80, 90, 50))
## Dataframe #2
df2 <- data.frame(name = c("Paul Henderson", "John F. Smith", "Leon K.", "Luis Sierra"),
salary_old = c(60, 55, 60, 80))
I would like to do a sort of mapping between the two name columns where I would replace the names in DF1 with those in DF2 to produce the following results:
df3 <- data.frame(name = c("Jake", "Paul Henderson", "Luis Sierra", "Leon K."),
salary_new = c(60, 80, 90, 50))
What I am having trouble with is that each dataframe has names that don't belong to the other and the ones that do belong have a slight variation to them (include a last name, a middle initial). Is there a way to do a fuzzy matching of sorts to get the desired output and leave the names that aren't in DF2 the same as they appear in DF1?
We could extract the first word and do a join
library(dplyr)
library(stringr)
df2 %>%
mutate(name2 =word(name, 1)) %>%
left_join(df1, ., by = c("name" = "name2")) %>%
transmute(name = coalesce( name.y, name), salary_new)
-output
name salary_new
1 Jake 60
2 Paul Henderson 80
3 Luis Sierra 90
4 Leon K. 50
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With