Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why use GROUP BY in WINDOW FUNCTION

Im currently working with the northwind database and want to see the companies with more orders place in 1997. Im being ask to use windows function so i wrote this

select c.customerid,
       c.companyname,
       rank() over (order by count(orderid) desc )
from customers c
inner join orders o on c.customerid = o.customerid
where date_part('year',orderdate) = 1997;

However this code ask me to use GROUP BY with c.customerid. And i simply don't understand why. Supposedly this code will give me all the customers id and names and after that the window function kicks in giving them a rank base on the amount of orders. So why group them?

like image 377
Pablo Pizarro Avatar asked Dec 04 '25 17:12

Pablo Pizarro


2 Answers

Here:

rank() over (order by count(orderid) desc )

You have an aggregate function in the over() clause of the window function (count(orderid)), so you do need a group by clause. Your idea is to put in the same group all orders of the same customer:

select c.customerid,
       c.companyname,
       rank() over (order by count(*) desc) as rn
from customers c
inner join orders o on c.customerid = o.customerid
where o.orderdate = date '1997-01-01' and o.orderdate < '1998-01-01'
group by c.customerid;

Notes:

  • Filtering on literal dates is much more efficient than applying a date function on the date column

  • count(orderid) is equivalent to count(*) in the context of this query

  • Postgres understands functionnaly-dependent column: assuming that customerid is the primary key of customer, it is sufficient to put just that column in the group by clause

  • It is a good practice to give aliases to expressions in the select clause

  • Another good practice is to prefix all columns with the (alias of) table they belong to

like image 122
GMB Avatar answered Dec 06 '25 08:12

GMB


You would use it correctly in an aggregation query. That would be:

select c.customerid, c.companyname, count(*) as num_orders,
      rank() over (order by count(*) desc) as ranking
from customers c inner join 
     orders o
     on c.customerid = o.customerid
where date_part('year',orderdate) = 1997
group by c.customerid, c.companyname;

This counts the number of orders per customer in 1997. It then ranks the customers based on the number of orders.

I would advise you to use:

where orderdate >= '1997-01-01' and
      orderdate < '1998-01-01'

For the filtering by year. This allows Postgres to use an index if one is available.

like image 29
Gordon Linoff Avatar answered Dec 06 '25 07:12

Gordon Linoff



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!