Why use GROUP BY in WINDOW FUNCTION

Question

Im currently working with the northwind database and want to see the companies with more orders place in 1997. Im being ask to use windows function so i wrote this

select c.customerid,
       c.companyname,
       rank() over (order by count(orderid) desc )
from customers c
inner join orders o on c.customerid = o.customerid
where date_part('year',orderdate) = 1997;

However this code ask me to use GROUP BY with c.customerid. And i simply don't understand why. Supposedly this code will give me all the customers id and names and after that the window function kicks in giving them a rank base on the amount of orders. So why group them?

GMB · Accepted Answer

Here:

rank() over (order by count(orderid) desc )

You have an aggregate function in the over() clause of the window function (count(orderid)), so you do need a group by clause. Your idea is to put in the same group all orders of the same customer:

select c.customerid,
       c.companyname,
       rank() over (order by count(*) desc) as rn
from customers c
inner join orders o on c.customerid = o.customerid
where o.orderdate = date '1997-01-01' and o.orderdate < '1998-01-01'
group by c.customerid;

Notes:

Filtering on literal dates is much more efficient than applying a date function on the date column
count(orderid) is equivalent to count(*) in the context of this query
Postgres understands functionnaly-dependent column: assuming that customerid is the primary key of customer, it is sufficient to put just that column in the group by clause
It is a good practice to give aliases to expressions in the select clause
Another good practice is to prefix all columns with the (alias of) table they belong to

Gordon Linoff · Answer

You would use it correctly in an aggregation query. That would be:

select c.customerid, c.companyname, count(*) as num_orders,
      rank() over (order by count(*) desc) as ranking
from customers c inner join 
     orders o
     on c.customerid = o.customerid
where date_part('year',orderdate) = 1997
group by c.customerid, c.companyname;

This counts the number of orders per customer in 1997. It then ranks the customers based on the number of orders.

I would advise you to use:

where orderdate >= '1997-01-01' and
      orderdate < '1998-01-01'

For the filtering by year. This allows Postgres to use an index if one is available.

Why use GROUP BY in WINDOW FUNCTION

Tags:

sql

postgresql

count

aggregate-functions

window-functions

Pablo Pizarro

2 Answers

GMB

Gordon Linoff

Recent Activity

Donate For Us

Why use GROUP BY in WINDOW FUNCTION

Tags:

sql

postgresql

count

aggregate-functions

window-functions

Pablo Pizarro

2 Answers

GMB

Gordon Linoff

Related questions

Recent Activity

Donate For Us