Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

efficient way to compare two tables in bigquery

I am interested in comparing, whether two tables contain the same data.

I could do it like this:

#standardSQL
SELECT
    key1, key2
FROM
(
    SELECT 
    table1.key1,
    table1.key2,
    table1.column1 - table2.column1 as col1,
    table1.col2 - table2.col2 as col2
    FROM
        `table1` AS table1
    LEFT JOIN
        `table2` AS table2
    ON
        table1.key1 = table2.key1
    AND
        table1.key2 = table2.key2
)
WHERE 
    col1 != 0
OR
    col2 != 0

But when I want to compare all numerical columns, this is kind of hard, especially if I want to do it for multiple table combinations.

Therefore my question: Is someone aware of a possibility to iterate over all numerical columns and restrict the result set to those keys where any of these differences where not zero?

like image 606
Nico Albers Avatar asked Aug 31 '25 15:08

Nico Albers


1 Answers

In Standard SQL, we found using a UNION ALL of two EXCEPT DISTINCT's works for our use cases:

(
  SELECT * FROM table1
  EXCEPT DISTINCT
  SELECT * FROM table2
)

UNION ALL

(
  SELECT * FROM table2
  EXCEPT DISTINCT
  SELECT * FROM table1
)

This will produce differences in both directions:

  • rows in table1 that are not in table2
  • rows in table2 that are not in table1

Notes and caveats:

  • table1 and table2 must be of the same width and have columns in the same order and type.
  • this does not work directly with STRUCT or ARRAY data types. You should either UNNEST, or use TO_JSON_STRING to convert the these data types first.
  • this does not directly work with GEOGRAPHY either, you must cast to text first using ST_AsText
like image 197
Jordan Arseno Avatar answered Sep 04 '25 00:09

Jordan Arseno