Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why doesn't this simple SQL query work?

Tags:

sql

sql-server

So I'm working on SQL Server 2008 and I have this query which should be quite simple, but for some reason doesn't work. It basically looks like that:

SELECT TOP 10
    u.Id                AS "UserId",
    u.CreationDate      AS "Member since",
    AVG(q.Score)        AS "Average Question Rating",  
    COUNT(q.Id)         AS "N. of Questions posted by the agent",
    AVG(a.Score)        AS "Average Answer Rating",  
    COUNT(a.Id)         AS "N. of Answers posted by the agent"
FROM    
        Users u, 
        Answers a, 
        Questions q
WHERE q.OwnerUserId = u.Id
AND a.OwnerUserId = u.Id
GROUP BY u.Id, u.CreationDate

When I only work on either the Answers table or the Questions table, everything is ok. But as soon as I try to do both at once (like in the query above), the COUNTs don't work at all. What I get is that the COUNT(a.Id) is identical to the COUNT(q.Id). So I tried reducing my query to see what was wrong, and I realized that I just had to add the Questions or the Answers table (even without using them anywhere) to the FROM clause when working with the other table and everything was ruined.

I'm sure it's something ridiculously trivial that I have overlooked but it's driving me crazy, I'd be thankful if anybody could point me what went wrong. Thank you in advance.

like image 372
Jérémie Clos Avatar asked Mar 25 '26 08:03

Jérémie Clos


2 Answers

You're not joining Answers and Questions correctly for the aggregation. Between Answers and Questions, the result is a cartesian product (for every user, every answer is coupled with every question)

The simplest way to correct this is to perform aggregation in subqueries:

SELECT TOP 10
    u.Id                AS "UserId",
    u.CreationDate      AS "Member since",
    ISNULL((SELECT AVG(Score) FROM Answers   WHERE OwnerUserId = u.Id), 0)
                        AS "Average Question Rating",  
           (SELECT COUNT(*)   FROM Answers   WHERE OwnerUserId = u.Id)        
                        AS "N. of Questions posted by the agent",
    ISNULL((SELECT AVG(Score) FROM Questions WHERE OwnerUserId = u.Id), 0)
                        AS "Average Answer Rating",  
           (SELECT COUNT(*)   FROM Questions WHERE OwnerUserId = u.Id)
                        AS "N. of Answers posted by the agent"
FROM  Users u

Alternatively using joins:

SELECT TOP 10
     u.Id                AS "UserId",
     u.CreationDate      AS "Member since",
     ISNULL(q.a, 0)      AS "Average Question Rating",  
     ISNULL(q.c, 0)      AS "N. of Questions posted by the agent",
     ISNULL(a.a, 0)      AS "Average Answer Rating",  
     ISNULL(a.c, 0)      AS "N. of Answers posted by the agent"
FROM Users u
-- If you LEFT JOIN these tables, you'll get also results for users without
-- questions or answers
LEFT OUTER JOIN (SELECT OwnerUserId, AVG(Score) a, COUNT(*) c 
     FROM Questions GROUP BY OwnerUserId) q
     ON  q.OwnerUserId = u.Id
LEFT OUTER JOIN (SELECT OwnerUserId, AVG(Score) a, COUNT(*) c 
     FROM Answers GROUP BY OwnerUserId) a
     ON  a.OwnerUserId = u.Id

I don't know SQL Server's query optimiser well enough, so I can't say which one is going to be faster. The first solution could take advantage of scalar subquery caching, if that is available in SQL Server. Otherwise, the second query maybe performs less nested loops.

like image 161
Lukas Eder Avatar answered Mar 27 '26 23:03

Lukas Eder


As noted elsewhere, your join on user ID on both questions and answers essentially produces a cartesian join at the user level between the two tables. A better approach would be to use a union:

SELECT TOP 10
    u.Id                AS "UserId",
    u.CreationDate      AS "Member since",
    AVG(q_score)        AS "Average Question Rating",  
    COUNT(q_id)         AS "N. of Questions posted by the agent",
    AVG(a_score)        AS "Average Answer Rating",  
    COUNT(a_id)         AS "N. of Answers posted by the agent"
FROM Users u
JOIN (select OwnerUserId,
             Score        q_score,
             Id           q_id,
             NULL         a_score,
             NULL         a_id
      from Answers
      union all
      select OwnerUserId,
             NULL         q_score,
             NULL         q_id,
             Score        a_score,
             Id           a_id
      from Questions) qa
  ON qa.OwnerUserId = u.Id
GROUP BY u.Id, u.CreationDate

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!