I have a number of tables that are joined and have a maximum number of rows about 4 million records. We are searching this table within a Stored Procedure and have an optional parameter with a default value of NULL, below is an redacted example of what we are running, there are more tables involved in the join but only 1 field has a WHERE clause;
DECLARE     @OwnerId VARCHAR (50)=NULL
SET @OwnerId = 'A123456'
SELECT DISTINCT
  t1.Id,
  t2.OwnerId,
FROM
  table1 t1
  INNER JOIN [table2] t2 m ON t1.Id = t2.id
WHERE   
  t2.OwnerId = @OwnerId  
There is an index on OwnerId. Running the query as above returns the results (600) in less than 1 second. However, as indicated the parameter is optional (along with others) and as soon as I change it slightly to include the value if not NULL (which is what I believe this is doing), the same search goes up to over 10 seconds.
SET @OwnerId = 'A123456'
SELECT DISTINCT
  t1.Id,
  t2.OwnerId,
FROM
  table1 t1
  INNER JOIN [table2] t2 ON t1.Id = t2.id
WHERE   
  (@OwnerId IS null OR (t2.OwnerId = @OwnerId))  
I am not able to run an execution plan on the production server but can see a slightly difference when running on a dev server (this does not have 4 million rows), so something is changing, but unsure as to what.
Queries like this can suffer hugely from query plan caching. When you have an SP with a clause like WHERE Column = @Param OR @Param IS NULL, SQL Server will cache the plan fthe first time the SP is run. If @Param has a non-NULL value, then that plan could be awful for when it has the value NULL, as the row estimates will be severely lower than what it will actually return.
Also, a plan with a non-NULL value is likely to involve an Index Seek (I hope you have good indexing), where as a Seek across the entire table would be very poor for performance (likely worse than a scan).
For something as simply as this, you can probably "get away" with simply adding OPTION (RECOMPILE); which will force the DBMS to recreate the plan each time you run the query. This does, however, have an overhead, and with complex queries you should probably use Dynamic SQL to generate a dynamic WHERE clause:
SELECT DISTINCT
  t1.Id,
  t2.OwnerId,
FROM
  table1 t1
  INNER JOIN [table2] t2 ON t1.Id = t2.id
WHERE   
  (@OwnerId IS null OR (t2.OwnerId = @OwnerId))  
OPTION (RECOMPILE);
DISTINCT, on a table with 4 million rows, and asking for every row, is going to be problematic anyway; as that is a lot of overhead for the server as it'll need to sort all the rows. If you have indexes on t1.Id and t2.OwnerId, then that will help the RDBMS; but if you're doing a DISTINCT on that large of a dataset, that does infer a design flaw, in my view.
Gail Shaw and Aaron Bertrand, have also written some articles on what they coin as Catch-All Queries and Kitchen Sink Queries respectively, which are likely worth a read.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With