Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQL full join with conditions on tables

I have two versions of the same dataset and I need to do a full join to find records missing in one of them, both have some missing records. I've managed to come up with two ways of doing it but both have disadvantages. My dataset sizes and filter conditions are very big.

Solution 1 has a drawback of using CTE, which will split the filters and make the code much harder to read, I would like to have just one query:

create table #temp (id int, vers nvarchar(1))
insert into #temp select 1,'a' union select 2,'a' union select 3,'a'
              union select 1,'b' union select 2,'b' union select 100,'b'

;WITH vers_a as (SELECT * FROM #temp WHERE vers = 'a')
,vers_b as (SELECT * FROM #temp WHERE vers = 'b')

SELECT ta.id, tb.id, ta.vers, tb.vers
FROM vers_a ta
FULL JOIN vers_b tb on ta.id = tb.id
WHERE ta.id is null or tb.id is null

drop table #temp

Solution 2 duplicates the filters and the execution plan is bigger:

create table #temp (id int, vers nvarchar(1))
insert into #temp select 1,'a' union select 2,'a' union select 3,'a'
              union select 1,'b' union select 2,'b' union select 100,'b'

SELECT ta.id, tb.id, ta.vers, tb.vers
FROM #temp ta
FULL JOIN #temp tb on ta.id = tb.id and ta.vers = 'a' and tb.vers = 'b'
WHERE (ta.id is null or tb.id is null) and (ta.vers = 'a' or tb.vers = 'b')

drop table #temp

So my question is, is it possible to have something like solution 2 but without the double condition definitions and with smaller execution plan like in solution 1?

Edit: When running both solutions in one query I can see that solution 2 cots 26% and solution 1 costs 45%, though it has smaller execution plan. I would like the faster solution (not necessarily with smaller execution plan like I said in the question) without code duplication if possible.

Edit2: Sorry for misleading 1st edit, I'm not good at optimizing :) I tested this on a ~1.5mil rows set and solution 1 is faster, to get the set used this:

create table #temp (id int, vers nvarchar(1))
insert into #temp select 1,'a' union select 2,'a' union select 3,'a'
              union select 1,'b' union select 2,'b' union select 100,'b'
while (select count(*) from #temp) < 1000000
begin
    insert into #temp select id+ABS(CHECKSUM(NewId()))%10000, vers from #temp
end
like image 379
Alex_404 Avatar asked Dec 07 '25 06:12

Alex_404


1 Answers

This should have a good plan. Index on vers may help.

SELECT ta.id, tb.id, ta.vers, tb.vers
FROM (SELECT * FROM #temp WHERE vers = 'a') ta
FULL JOIN (SELECT * FROM #temp WHERE vers = 'b') tb on ta.id = tb.id 
WHERE (ta.id is null or tb.id is null) 

EDIT Made some tests. The above query has better CPU then 2 other versions.

-- SETUP
drop table temp;
go

create table temp (
   id int 
  ,vers nvarchar(1));

insert temp(id,vers)
select top(100000)
    row_number() over(order by (select null)) / 2  
  , case ABS(CHECKSUM(NewId())) % 2 when 0 then 'a' else 'b' end
from sys.all_objects t, sys.all_objects t1 ;


create index idx_temp_vers on temp(vers) include(id)
 with
  fillfactor=90;

select top(50) *
from temp;

-- TEST RUNS 
SET STATISTICS TIME ON;

print ' 1 index query 1 '
SELECT ta.id, tb.id, ta.vers, tb.vers
FROM (SELECT * FROM temp WHERE vers = 'a') ta
FULL JOIN (SELECT * FROM temp WHERE vers = 'b') tb on ta.id = tb.id 
WHERE (ta.id is null or tb.id is null)
;
print ' 1 index query 2 '
SELECT ta.id, tb.id, ta.vers, tb.vers
FROM temp ta
FULL JOIN temp tb on ta.id = tb.id and ta.vers = 'a' and tb.vers = 'b'
WHERE (ta.id is null or tb.id is null) and (ta.vers = 'a' or tb.vers = 'b')
;
print ' 1 index query 3 '
SELECT ta.id, TA.vers 
  from temp ta 
  where ta.vers = 'a' 
         and TA.id NOT IN(SELECT tb.id FROM temp tb WHERE tb.vers = 'b')
UNION ALL 
SELECT tb.id, Tb.vers 
  from temp tb 
  where tb.vers = 'b' 
         and Tb.id NOT IN(SELECT ta.id FROM temp ta WHERE ta.vers = 'a')

-- One more index
create index idx_temp_id on temp(id)
 with
  fillfactor=90;


print ' 2 indexes query 1 '
SELECT ta.id, tb.id, ta.vers, tb.vers
FROM (SELECT * FROM temp WHERE vers = 'a') ta
FULL JOIN (SELECT * FROM temp WHERE vers = 'b') tb on ta.id = tb.id 
WHERE (ta.id is null or tb.id is null)
;
print ' 2 indexes query 2 '
SELECT ta.id, tb.id, ta.vers, tb.vers
FROM temp ta
FULL JOIN temp tb on ta.id = tb.id and ta.vers = 'a' and tb.vers = 'b'
WHERE (ta.id is null or tb.id is null) and (ta.vers = 'a' or tb.vers = 'b')
;
print ' 2 indexes query 3 '
SELECT ta.id, TA.vers 
  from temp ta 
  where ta.vers = 'a' 
         and TA.id NOT IN(SELECT tb.id FROM temp tb WHERE tb.vers = 'b')
UNION ALL 
SELECT tb.id, Tb.vers 
  from temp tb 
  where tb.vers = 'b' 
         and Tb.id NOT IN(SELECT ta.id FROM temp ta WHERE ta.vers = 'a')


SET STATISTICS TIME OFF;

Results

 1 index query 1 
(49898 row(s) affected)
 SQL Server Execution Times:
   CPU time = 156 ms,  elapsed time = 3825 ms.

 1 index query 2 
(49898 row(s) affected)
 SQL Server Execution Times:
   CPU time = 281 ms,  elapsed time = 2962 ms.

 1 index query 3 
(49898 row(s) affected)
 SQL Server Execution Times:
   CPU time = 422 ms,  elapsed time = 2508 ms.

 2 indexes query 1 
(49898 row(s) affected)
 SQL Server Execution Times:
   CPU time = 172 ms,  elapsed time = 2679 ms.

 2 indexes query 2 
(49898 row(s) affected)
 SQL Server Execution Times:
   CPU time = 406 ms,  elapsed time = 3468 ms.

 2 indexes query 3 
(49898 row(s) affected)
 SQL Server Execution Times:
   CPU time = 407 ms,  elapsed time = 3728 ms.
like image 148
Serg Avatar answered Dec 09 '25 19:12

Serg



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!