I'm doing paging with SQL Server and I'd like to avoid duplication by counting the total number of results as part of my partial resultset, rather than getting that resultset and then doing a separate query to get the count afterwards. However, the trouble is, it seems to be increasing execution time. For example, if I check with SET STATISTICS TIME ON, this:
WITH PagedResults AS (
SELECT
ROW_NUMBER() OVER (ORDER BY AggregateId ASC) AS RowNumber,
COUNT(PK_MatrixItemId) OVER() AS TotalRowCount,
*
FROM [MyTable] myTbl WITH(NOLOCK)
)
SELECT * FROM PagedResults
WHERE RowNumber BETWEEN 3 AND 4810
... or this (whose execution plan is identical):
SELECT * FROM (
SELECT TOP (4813)
ROW_NUMBER() OVER (ORDER BY AggregateId ASC) AS RowNumber,
COUNT(PK_MatrixItemId) OVER() AS TotalRowCount,
*
FROM [MyTable] myTbl WITH(NOLOCK)
) PagedResults
WHERE PagedResults.RowNumber BETWEEN 3 AND 4810
... seems to be averaging a CPU time (all queries added up) of 1.5 to 2 times as much as this:
SELECT * FROM (
SELECT TOP (4813)
ROW_NUMBER() OVER (ORDER BY AggregateId ASC) AS RowNumber,
*
FROM [MyTable] myTbl WITH(NOLOCK)
) PagedResults
WHERE PagedResults.RowNumber BETWEEN 3 AND 4810
SELECT COUNT(*) FROM [MyTable] myTbl WITH(NOLOCK)
Obviously I'd rather use the former than the latter because the latter redundantly repeats the FROM clause (and would repeat any WHERE clauses if I had any), but its execution time is so much better I really have to use it. Is there a way I can get the former's execution time down at all?
CTE's are inlined into the query plan. They perform exactly the same as derived tables do.
Derived tables do not correspond to physical operations. They do not "materialize" the result set into a temp table. (I believe MySQL does this, but MySQL is about the most primitive mainstream RDBMS there is.)
Using OVER() does indeed manifest itself in the query plan as buffering to a temp table. It is not at all clear why that would be faster here than just re-reading the underlying table. Buffering is rather slow because writes are more CPU intensive than reads in SQL Server. We can just read twice from the original table. That's probably why the latter option is faster.
If you want to avoid repeating parts of a query, use a view or table-valued function. Granted, these are not great options for one-off queries. You can also generate SQL in the application layer and reuse strings. ORMs also make this a lot easier.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With