I'm working on improving part of an existing ETL layer in Oracle.
It's step 2 that I want to improve, it seems less than ideal to do this as several steps.
MERGE INTO temp t
USING dimension_1 d1 ON (d1.natural_key = t.d1_natural_key)
WHEN MATCHED THEN UPDATE t.d1_id = d1.id
MERGE INTO temp t
USING dimension_2 d2 ON (d2.natural_key = t.d2_natural_key)
WHEN MATCHED THEN UPDATE t.d2_id = d2.id
MERGE INTO temp t
USING dimension_3 d3 ON (d3.natural_key = t.d3_natural_key)
WHEN MATCHED THEN UPDATE t.d3_id = d3.id
If I was writing this in SQL Server I'd do something like the following:
UPDATE
t
SET
d1_id = COALESCE(d1.id, -1),
d2_id = COALESCE(d2.id, -1),
d3_id = COALESCE(d3.id, -1)
FROM
temp t
LEFT JOIN
dimension_1 d1
ON d1.natural_key = t.d1_natural_key
LEFT JOIN
dimension_2 d2
ON d2.natural_key = t.d2_natural_key
LEFT JOIN
dimension_3 d3
ON d3.natural_key = t.d3_natural_key
For the life of me I can't find what seems like a sensible option in Oracle. The best I have been able to work out is to use UPDATE (while everyone around me is screaming that I 'must' use MERGE) and correlated sub-queries; something like...
UPDATE
temp t
SET
d1_id = COALESCE((SELECT id FROM dimension_1 d1 WHERE d1.natural_key = t.d1_natural_key), -1),
d2_id = COALESCE((SELECT id FROM dimension_2 d2 WHERE d2.natural_key = t.d2_natural_key), -1),
d3_id = COALESCE((SELECT id FROM dimension_3 d3 WHERE d3.natural_key = t.d3_natural_key), -1)
Are there any better alternatives? Or is the correlated sub-query approach actually performant in Oracle?
I think the equivalent of your SQL Server update would be:
UPDATE
temp t1
SET
(d1_id, d2_id, d3_id) = (
SELECT
COALESCE(d1.id, -1),
COALESCE(d2.id, -1),
COALESCE(d3.id, -1)
FROM
temp t2
LEFT JOIN
dimension_1 d1
ON d1.natural_key = t2.d1_natural_key
LEFT JOIN
dimension_2 d2
ON d2.natural_key = t2.d2_natural_key
LEFT JOIN
dimension_3 d3
ON d3.natural_key = t2.d3_natural_key
WHERE
t2.id = t1.id
)
It's still a correlated update; the joining takes place in the subquery, since Oracle doesn't let you join as part of the update itself. Normally you wouldn't need (or want) to refer to the target outer table again in the subquery, but you need something to outer-join against here.
You can also combine the left-join approach with a merge, putting essentially the same subquery into the using clause:
MERGE INTO temp t
USING (
SELECT t.id,
COALESCE(d1.id, -1) AS d1_id,
COALESCE(d2.id, -1) AS d2_id,
COALESCE(d3.id, -1) AS d3_id
FROM
temp t
LEFT JOIN
dimension_1 d1
ON d1.natural_key = t.d1_natural_key
LEFT JOIN
dimension_2 d2
ON d2.natural_key = t.d2_natural_key
LEFT JOIN
dimension_3 d3
ON d3.natural_key = t.d3_natural_key
) d
ON (d.id = t.id)
WHEN MATCHED THEN UPDATE SET
t.d1_id = d.d1_id,
t.d2_id = d.d2_id,
t.d3_id = d.d3_id
I don't see any real benefit of using merge over update in this case though.
Both will overwrite any existing values in your three ID columns, but it sounds like you are not expecting there to be any.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With