I have 2 reproducible dataframes over here. I am trying to identify which column contain values that are similar to another column. I hope my code will take in every row and loop through every single column in df2. My code works below, but it requires fine-tuning to allow multiple matches with the same column.
df1 <- data.frame(fruit=c("Apple", "Orange", "Pear"), location = c("Japan", "China", "Nigeria"), price = c(32,53,12))
df2 <- data.frame(grocery = c("Durian", "Apple", "Watermelon"),
place=c("Korea", "Japan", "Malaysia"),
name = c("Mark", "John", "Tammy"),
favourite.food = c("Apple", "Wings", "Cakes"),
invoice = c("XD1", "XD2", "XD3"))
df <- sapply(names(df1), function(x) {
temp <- sapply(names(df2), function(y)
if(any(match(df1[[x]], df2[[y]], nomatch = FALSE))) y else NA)
ifelse(all(is.na(temp)), NA, temp[which.max(!is.na(temp))])
}
)
t1 <- data.frame(lapply(df, type.convert), stringsAsFactors=FALSE)
t1 <- data.frame(t(t1))
t1 <- cbind(newColName = rownames(t1), t1)
rownames(t1) <- 1:nrow(t1)
colnames(t1) <- c("Columns from df1", "Columns from df2")
df1
fruit location price
1 Apple Japan 32
2 Orange China 53
3 Pear Nigeria 12
df2
grocery place name favourite.food invoice
1 Durian Korea Mark Apple XD1
2 Apple Japan John Wings XD2
3 Watermelon Malaysia Tammy Cakes XD3
t1 #(OUTPUT FROM CODE ABOVE)
Columns from df1 Columns from df2
1 fruit grocery
2 location place
3 price <NA>
This is the output I hope to obtain instead:
Columns from df1 Columns from df2
1 fruit grocery, favourite.food
2 location place
3 price <NA>
Notice that the columns, "Grocery" and "favourite.food" both matches to the column "fruit", whereas my code only returns one column.
We can change the code to return all the matches instead and wrap them in one string using toString
vec <- sapply(names(df1), function(x) {
temp <- sapply(names(df2), function(y)
if(any(match(df1[[x]], df2[[y]], nomatch = FALSE))) y else NA)
ifelse(all(is.na(temp)), NA, toString(temp[!is.na(temp)]))
}
)
vec
# fruit location price
#"grocery, favourite.food" "place" NA
To convert it into dataframe, we can do
data.frame(columns_from_df1 = names(vec), columns_from_df2 = vec, row.names = NULL)
# columns_from_df1 columns_from_df2
#1 fruit grocery, favourite.food
#2 location place
#3 price <NA>
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With