Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

merge dataframes based on common columns but keeping all rows from x [duplicate]

Tags:

merge

dataframe

r

I need to merge two dataframes x and y which have about 50 columns in common and some unique columns, and I need to keep all the rows from x.

It works if I run:

 NewDataframe <- merge(x, y, by=c("ColumnA", "ColumnB", "ColumnC"),all.x=TRUE)

The issue is that there are more than 50 common columns, and I would rather avoid typing the names of all the common columns.

I have tried with:

 NewDataframe <- merge(x, y, all.x=TRUE)

But the following error appears:

 Error in merge.data.table(x, y, all.x = TRUE) :
 Elements listed in `by` must be valid column names in x and y

Is there any way of using by with the common columns without typing all of them, but keeping all the rows from x?

Thank you.

like image 492
dede Avatar asked Oct 15 '25 12:10

dede


1 Answers

You want to merge based on all common columns. So first you need to find out which column names are common between the two dataframes.

common_col_names <- intersect(names(x), names(y))

Then you use this character vector as your by parameters in the merge function.

merge(x, y, by=common_col_names, all.x=TRUE)

Edit: after reading @Andrew Gustar's answer, I double checked the documentation for the merge function, and this is exactly the default by parameter:

## S3 method for class 'data.frame'
merge(x, y, by = intersect(names(x), names(y)), # <-- Look here
      by.x = by, by.y = by, all = FALSE, all.x = all, all.y = all,
      sort = TRUE, suffixes = c(".x",".y"),
      incomparables = NULL, ...)
like image 198
zelite Avatar answered Oct 18 '25 08:10

zelite



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!