Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Efficiently updating a table from multiple sources

I'm working on improving part of an existing ETL layer in Oracle.

  1. A file is loaded in to a temporary table.
  2. Many MERGE statement are executed to resolve surrogate keys.
  3. Some other business logic is applied (which require those surrogate keys).
  4. The results are MERGEd in to a table (with both the surrogate keys and the business logic results)

It's step 2 that I want to improve, it seems less than ideal to do this as several steps.

MERGE INTO temp t
USING dimension_1 d1 ON (d1.natural_key = t.d1_natural_key)
WHEN MATCHED THEN UPDATE t.d1_id = d1.id

MERGE INTO temp t
USING dimension_2 d2 ON (d2.natural_key = t.d2_natural_key)
WHEN MATCHED THEN UPDATE t.d2_id = d2.id

MERGE INTO temp t
USING dimension_3 d3 ON (d3.natural_key = t.d3_natural_key)
WHEN MATCHED THEN UPDATE t.d3_id = d3.id

If I was writing this in SQL Server I'd do something like the following:

UPDATE
    t
SET
    d1_id = COALESCE(d1.id, -1),
    d2_id = COALESCE(d2.id, -1),
    d3_id = COALESCE(d3.id, -1)
FROM
    temp  t
LEFT JOIN
    dimension_1   d1
        ON d1.natural_key = t.d1_natural_key
LEFT JOIN
    dimension_2   d2
        ON d2.natural_key = t.d2_natural_key
LEFT JOIN
    dimension_3   d3
        ON d3.natural_key = t.d3_natural_key

For the life of me I can't find what seems like a sensible option in Oracle. The best I have been able to work out is to use UPDATE (while everyone around me is screaming that I 'must' use MERGE) and correlated sub-queries; something like...

UPDATE
    temp t
SET
    d1_id = COALESCE((SELECT id FROM dimension_1 d1 WHERE d1.natural_key = t.d1_natural_key), -1),
    d2_id = COALESCE((SELECT id FROM dimension_2 d2 WHERE d2.natural_key = t.d2_natural_key), -1),
    d3_id = COALESCE((SELECT id FROM dimension_3 d3 WHERE d3.natural_key = t.d3_natural_key), -1)

Are there any better alternatives? Or is the correlated sub-query approach actually performant in Oracle?

like image 850
MatBailie Avatar asked Dec 11 '25 00:12

MatBailie


1 Answers

I think the equivalent of your SQL Server update would be:

UPDATE
    temp t1
SET
    (d1_id, d2_id, d3_id) = (
SELECT
    COALESCE(d1.id, -1),
    COALESCE(d2.id, -1),
    COALESCE(d3.id, -1)
FROM
    temp  t2
LEFT JOIN
    dimension_1   d1
        ON d1.natural_key = t2.d1_natural_key
LEFT JOIN
    dimension_2   d2
        ON d2.natural_key = t2.d2_natural_key
LEFT JOIN
    dimension_3   d3
        ON d3.natural_key = t2.d3_natural_key
WHERE
    t2.id = t1.id 
)

It's still a correlated update; the joining takes place in the subquery, since Oracle doesn't let you join as part of the update itself. Normally you wouldn't need (or want) to refer to the target outer table again in the subquery, but you need something to outer-join against here.

You can also combine the left-join approach with a merge, putting essentially the same subquery into the using clause:

MERGE INTO temp t
USING (
  SELECT t.id,
    COALESCE(d1.id, -1) AS d1_id,
    COALESCE(d2.id, -1) AS d2_id,
    COALESCE(d3.id, -1) AS d3_id
  FROM
    temp  t
  LEFT JOIN
    dimension_1   d1
      ON d1.natural_key = t.d1_natural_key
  LEFT JOIN
    dimension_2   d2
      ON d2.natural_key = t.d2_natural_key
  LEFT JOIN
    dimension_3   d3
      ON d3.natural_key = t.d3_natural_key
) d
ON (d.id = t.id)
WHEN MATCHED THEN UPDATE SET
  t.d1_id = d.d1_id,
  t.d2_id = d.d2_id,
  t.d3_id = d.d3_id

I don't see any real benefit of using merge over update in this case though.

Both will overwrite any existing values in your three ID columns, but it sounds like you are not expecting there to be any.

like image 108
Alex Poole Avatar answered Dec 12 '25 17:12

Alex Poole