I have a particular SQL query that seems to suffer from a mysterious performance issue. Here is the query:
SELECT COUNT(LengthOfTime) AS TotalTime, 
       SUM(LengthOfTime) AS TotalLength, 
       SUM(LengthOfTime) / COUNT(LengthOfTime) AS AverageTime, 
       SUM(Pops) / COUNT(LengthOfTime) AS AveragePop 
  FROM ((SELECT * 
           FROM (SELECT *, ID & YearRec AS ID2 
                   FROM MyFirstTable 
                 UNION ALL 
                 SELECT *, ID & YearRec AS ID2 
                   FROM Table2011) AS TEMP 
          WHERE STARTTIME >= '8/1/2011 00:00:00' 
            AND StartTime <= '8/5/2011 23:59:59' ) AS TEMP2 
  JOIN AppleTable ON TEMP2.Reason = AppleTable.Skills ) 
  JOIN PeopleTable ON TEMP2.Operator = PeopleTable.Operators 
 WHERE AppleTable.[ON] = 1 
   AND PeopleTable.[ON] = 1 
   AND Rec_Type = 'SECRET AGENT'
The issue here is that this query runs very quickly (0:00 to 0:02) when run for a 5 day span, but very slowly (1:20 to 1:45) for a 6 day span.
There are approximately 105,000 records per day in the Tables (MyFirstTable and Table2011).
My question: Is there an upper limit to the number of rows you can pass an aggregate function before you see a serious performance issue in SQL Server? (currently using 2008 R2)
Why are SQL aggregate functions important? Aggregate functions are important tools for calculating and working with numerical data in a database. Using aggregate functions can allow you to extract key pieces of data from a set for further use.
Database-level aggregation is much faster than executing the same aggregation logic at the application level while looping on large arrays. Using queries involving SQL aggregate functions with the GROUP BY statement allows you to reduce the number of rows returned drastically.
Short answer: No, there's not some magic number of records that will cause MSSQL to start performing poorly.
Now, it's possible queries won't scale well and, as a result, the larger the dataset the [exponentially] worse it performs.
A large problem you're going to have is that you're predicating the StartTime after the UNIONED statements. Instead, try predicating on that in both of your selects prior to the UNION. That should make a huge difference, especially if you index both tables on StartTime (generating index seeks on those tables).
SELECT * FROM (
SELECT *, ID & YearRec AS ID2 FROM MyFirstTable 
   WHERE STARTTIME >= '8/1/2011 00:00:00' 
   AND STARTTIME <= '8/5/2011 23:59:59'
UNION ALL SELECT *, ID & YearRec AS ID2 
FROM Table2011
   WHERE STARTTIME >= '8/1/2011 00:00:00' 
   AND STARTTIME <= '8/5/2011 23:59:59'
) AS TEMP 
You may be able to do some additional refactoring of your code as well.
No, there is no pre-defined upper limit for aggregate functions.
The skew in performance is likely affected by one or multiple of the following:
You can run the query in SSMS and view the actual execution plan. This will tell you the places where the cost of running the query is the highest, and that will help you determine the best course of action.
Edit based on comments:
If there isn't an index on Table2011 that contains [STARTTIME], then create one. If there is an index, but it is getting ignored, then you have to figure out why. If the is fragmented, then rebuilding the index will definitely help. Here is how to rebuild
ALTER INDEX [YourIndexName] ON [dbo].[Table2011] REBUILD WITH (STATISTICS_NORECOMPUTE = ON);
Alternately you can do this in SSMS - browse to the specific index in the object browser, right click and rebuild.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With