Im currently working with the northwind database and want to see the companies with more orders place in 1997. Im being ask to use windows function so i wrote this
select c.customerid,
c.companyname,
rank() over (order by count(orderid) desc )
from customers c
inner join orders o on c.customerid = o.customerid
where date_part('year',orderdate) = 1997;
However this code ask me to use GROUP BY with c.customerid. And i simply don't understand why. Supposedly this code will give me all the customers id and names and after that the window function kicks in giving them a rank base on the amount of orders. So why group them?
Here:
rank() over (order by count(orderid) desc )
You have an aggregate function in the over() clause of the window function (count(orderid)), so you do need a group by clause. Your idea is to put in the same group all orders of the same customer:
select c.customerid,
c.companyname,
rank() over (order by count(*) desc) as rn
from customers c
inner join orders o on c.customerid = o.customerid
where o.orderdate = date '1997-01-01' and o.orderdate < '1998-01-01'
group by c.customerid;
Notes:
Filtering on literal dates is much more efficient than applying a date function on the date column
count(orderid) is equivalent to count(*) in the context of this query
Postgres understands functionnaly-dependent column: assuming that customerid is the primary key of customer, it is sufficient to put just that column in the group by clause
It is a good practice to give aliases to expressions in the select clause
Another good practice is to prefix all columns with the (alias of) table they belong to
You would use it correctly in an aggregation query. That would be:
select c.customerid, c.companyname, count(*) as num_orders,
rank() over (order by count(*) desc) as ranking
from customers c inner join
orders o
on c.customerid = o.customerid
where date_part('year',orderdate) = 1997
group by c.customerid, c.companyname;
This counts the number of orders per customer in 1997. It then ranks the customers based on the number of orders.
I would advise you to use:
where orderdate >= '1997-01-01' and
orderdate < '1998-01-01'
For the filtering by year. This allows Postgres to use an index if one is available.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With