Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Row bind tables in SQL with differing columns

Tags:

sql

join

r

union

I have a very simple request: I'd like to "stack" or vertically combine tables using SQL that share some column names, but not all column names.

If I were to attack this problem in R, Hadley Wickham's dplyr package has a nice function called bind_rows() that binds different tables by row, and coerces NA values when columns don't exist in one of the tables.

As an example, let's say we have table 'A':

a <- head(iris) %>% 
dplyr::mutate(., species_num = 1:nrow(.)) %>%
dplyr::select(., -Petal.Width)

enter image description here

And table 'B':

b <- tail(iris) %>%
dplyr::mutate(., species_num = 7:12)

enter image description here

Notice that table B has an extra column Petal.Width.

As I stated above, the R function bind_rows() in the dplyr package will do the following:

nice <- dplyr::bind_rows(a, b)

enter image description here

Pretty nice right?

Well, I want to perform this same action in SQL, but UNION fails when column number and/or names differ...

(SELECT *, FROM a)
UNION
(SELECT *, FROM b); 

enter image description here

Now, I realize that I can simply add the Petal.Width column to table a before using UNION, but the real world issue I'm tackling involves over 30 tables that each contain some columns but not others in varying degrees...and my end goal is to automate this process. In short, I need a solution that will not require me to hack around the problem or add columns manually to individual tables.

Any ideas?

like image 585
awags1 Avatar asked May 11 '26 22:05

awags1


1 Answers

Try this:

Prep with some fake data:

# con <- dbConnect(...)
DBI::dbWriteTable(con, "iris1", iris[1:3,-1])
DBI::dbWriteTable(con, "iris2", iris[4:6,-2])
DBI::dbWriteTable(con, "iris23", iris[7:9,-(2:3)])

Set up the list of field names:

list_of_tables <- c("iris1", "iris2", "iris23")
eachnames <- sapply(list_of_tables, function(a) DBI::dbQuoteIdentifier(con, DBI::dbListFields(con, a)), simplify = FALSE)
str(eachnames)
# List of 3
#  $ iris1 :Formal class 'SQL' [package "DBI"] with 1 slot
#   .. ..@ .Data: chr [1:4] "\"Sepal.Width\"" "\"Petal.Length\"" "\"Petal.Width\"" "\"Species\""
#  $ iris2 :Formal class 'SQL' [package "DBI"] with 1 slot
#   .. ..@ .Data: chr [1:4] "\"Sepal.Length\"" "\"Petal.Length\"" "\"Petal.Width\"" "\"Species\""
#  $ iris23:Formal class 'SQL' [package "DBI"] with 1 slot
#   .. ..@ .Data: chr [1:3] "\"Sepal.Length\"" "\"Petal.Width\"" "\"Species\""
allnames <- unique(unlist(eachnames, use.names=FALSE))
allnames
# [1] "\"Sepal.Width\""  "\"Petal.Length\"" "\"Petal.Width\""  "\"Species\""     
# [5] "\"Sepal.Length\""

I used DBI::dbQuoteIdentifier to be a little defensive in general, though it is specifically required due to column names (I'm using postgres which doesn't like an unescaped period in the field name).

The list of field names, augmented with null as, can be made with this:

list_of_fields <- lapply(eachnames, function(a) {
  paste(ifelse(allnames %in% a, allnames, paste("null as", allnames)), collapse = ", ")
})
str(list_of_fields)
# List of 3
#  $ iris1 : chr "\"Sepal.Width\", \"Petal.Length\", \"Petal.Width\", \"Species\", null as \"Sepal.Length\""
#  $ iris2 : chr "null as \"Sepal.Width\", \"Petal.Length\", \"Petal.Width\", \"Species\", \"Sepal.Length\""
#  $ iris23: chr "null as \"Sepal.Width\", null as \"Petal.Length\", \"Petal.Width\", \"Species\", \"Sepal.Length\""

If you have more complex query needs, then that's a good start. Here's a query that does no additional filtering:

qry <- paste(
  mapply(function(nm, flds) {
    paste("( select",
          paste(ifelse(allnames %in% flds, allnames, paste("null as", allnames)),
                collapse = ", "),
          "from", nm, ")")
  }, names(eachnames), eachnames),
  collapse = " union\n")
cat(qry)
# ( select "Sepal.Width", "Petal.Length", "Petal.Width", "Species", null as "Sepal.Length" from iris1 ) union
# ( select null as "Sepal.Width", "Petal.Length", "Petal.Width", "Species", "Sepal.Length" from iris2 ) union
# ( select null as "Sepal.Width", null as "Petal.Length", "Petal.Width", "Species", "Sepal.Length" from iris23 )
DBI::dbGetQuery(con, qry)
#   Sepal.Width Petal.Length Petal.Width Species Sepal.Length
# 1          NA          1.7         0.4  setosa          5.4
# 2          NA           NA         0.3  setosa          4.6
# 3          NA          1.5         0.2  setosa          4.6
# 4          NA          1.4         0.2  setosa          5.0
# 5         3.0          1.4         0.2  setosa           NA
# 6         3.2          1.3         0.2  setosa           NA
# 7          NA           NA         0.2  setosa          5.0
# 8          NA           NA         0.2  setosa          4.4
# 9         3.5          1.4         0.2  setosa           NA

Many DBAs will advise against using SELECT * in general, so this has a secondary benefit.

like image 134
r2evans Avatar answered May 14 '26 13:05

r2evans



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!