Ok, so I have a query:
select distinct(a)
from mytable
where
b in (0,3)
What is going to be faster, the above or
select distinct(a)
from mytable
where
b = 0
or
b = 3
Is there a general rule?
Thanks
IN operator has less complexity.. it will be faster.
Your answer The best way to know is to profile both on your database with your specific data to see which is faster. So, in this case, the method using OR is about 30% slower. Adding more terms makes the difference larger.
The results revealed Diamanti is ten times (10x) less expensive and four times (4x) faster while running Microsoft SQL server compared to Azure with Azure Ultra disks and thirteen times (13x) less costly and six times (6x) faster compared to AWS Nitro with IO2 disks.
The EXISTS clause is much faster than IN when the subquery results is very large. Conversely, the IN clause is faster than EXISTS when the subquery results is very small. Also, the IN clause can't compare anything with NULL values, but the EXISTS clause can compare everything with NULLs.
Both IN and OR will do a query for b = 0 followed by one for b = 3, and then do a merge join on the two result sets, and finally filter out any duplicates.
With IN, duplicates doesn't really make sense, because b can't both be 0 and 3, but the fact is that IN will be converted to b = 0 OR b = 3, and with OR, duplicates do make sense, because you could have b = 0 OR a = 3, and if you were to join the two separate result sets, you could end up with duplicates for each record that matched both criteria.
So a duplicate filtering will always be done, regardless of whether you're using IN or OR. However, if you know from the outset that you will not have any duplicates - which is usually the case when you're using IN - then you can gain some performance by using UNION ALL which doesn't filter out duplicates:
select distinct(a)
from mytable
where
b = 0
UNION ALL
select distinct(a)
from mytable
where
b = 3
Hopefully in this simple example it won't make any difference which version you use (as the query optimiser should turn them into equivalent queries under the hood), however there's a fair chance it's going to be dependent on the indexes you have on mytable. I would suggest that you run both queries in Sql Server Management Studio after having turned on "Include Actual Execution Plan", and compare the results to determine which query has the lowest "cost" in your scenario.
To do this:
The bottom "results" half of the window will now have a 3rd tab showing, "Execution Plan" which should contain two "flowcharts", one for the first query and another for the second. If the two are identical, then Sql Server has treated the two queries as equivalent and therefore you should choose whichever form you and/or your colleagues prefer.
As far as I know, IN converts to OR. So the performance is the same. Just a shorter way of writing it.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With