Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Duplicate data when using Snowflake SQL left join

I'm having trouble understanding LEFT JOIN results in Snowflake.

select * from
(select 1 as AccountIdA) s
LEFT OUTER JOIN (Select 1 as AccountIdB, 'Val1' as Col1 union select 1, 'Val2' as Col1) t1 on t1.AccountIdB = s.AccountIdA
LEFT OUTER JOIN (Select 1 as AccountIdC, 'Val3' as Col2) t2 on t2.AccountIdC = s.AccountIdA

The SQL above results in the following:

AccountIdA AccountIdB Col1 AccountIdC Col2
1 1 VAL1 1 VAL3
1 1 VAL2 1 VAL3

However, I am expecting the following, because there is only one row match from the second left join.

AccountIdA AccountIdB Col1 AccountIdC Col2
1 1 VAL1 1 VAL3
1 1 VAL2 null null

Is there any logic to this? Any way to get the result that I'm expecting?

like image 837
Eric Bowden Avatar asked Feb 23 '26 05:02

Eric Bowden


1 Answers

so just reformed that SQL so is was less wide, and just to make sense of it.

select * from (
    select 1 as AccountIdA
) s
LEFT OUTER JOIN (
    SELECT * FROM VALUES (1,'val1'), (1,  'Val2') v(AccountIdB, col1)
) t1 on t1.AccountIdB = s.AccountIdA
LEFT OUTER JOIN (
    SELECT * FROM VALUES (1,'val3') v(AccountIdC, col2)
) t2 on t2.AccountIdC = s.AccountIdA;

But you get two rows, because of standard SQL logic. The joins happen is serial, thus you first have s with a single AccountIdA of 1.

AccountIdA
1

Then we join to t1 and we have

AccountIdA AccountIdB col1
1 1 'val1'
1 1 'val2'

no surprises here:

now we join to t2 on where given we have 2 rows with AccountIdA we match the AccountIdC against both of those, and given they are both 1 we match both rows.

AccountIdA AccountIdB col1 AccountIdC col2
1 1 'val1' 1 'val3'
1 1 'val2' 1 'val3'

now what a LEFT JOIN gives you is if the left side is not matched by the right side we get NULL, this can swapping some things around like:

select * from (
    select column1 as AccountIdA FROM VALUES (1),(2),(3)
) s
LEFT OUTER JOIN (
    SELECT * FROM VALUES (1,'val1'), (2,  'Val2') v(AccountIdB, col1)
) t1 on t1.AccountIdB = s.AccountIdA
LEFT OUTER JOIN (
    SELECT * FROM VALUES (3,'val3') v(AccountIdC, col2)
) t2 on t2.AccountIdC = s.AccountIdA;

now we get matches on the keys that make sense, and no matches on the others.

ACCOUNTIDA ACCOUNTIDB COL1 ACCOUNTIDC COL2
1 1 val1
2 2 Val2
3 3 val3

So this goes to show when doing JOINs you have to make sure you have de-duplicated data, otherwise you can get multiplication of the rows

select * from (
    select column1 as AccountIdA FROM VALUES (1),(1)
) s
LEFT OUTER JOIN (
    SELECT * FROM VALUES (1,'val1'), (1,  'Val2') v(AccountIdB, col1)
) t1 on t1.AccountIdB = s.AccountIdA
LEFT OUTER JOIN (
    SELECT * FROM VALUES (1,'val3'),(1,'val4') v(AccountIdC, col2)
) t2 on t2.AccountIdC = s.AccountIdA;

gives an explosion to:

ACCOUNTIDA ACCOUNTIDB COL1 ACCOUNTIDC COL2
1 1 val1 1 val3
1 1 val1 1 val4
1 1 Val2 1 val3
1 1 Val2 1 val4
1 1 val1 1 val3
1 1 val1 1 val4
1 1 Val2 1 val3
1 1 Val2 1 val4
like image 186
Simeon Pilgrim Avatar answered Feb 27 '26 01:02

Simeon Pilgrim



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!