Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Finding the highest n values of each group in MySQL

I have some data formatted like this:

Lane         Series
1            680
1            685
1            688
2            666
2            425
2            775
...

And I'd like to grab the highest n series per lane (let's say 2 for the sake of this example, but it could be many more than that)

So the output should be:

Lane         Series
1            688
1            685
2            775
2            666

Getting the highest series per lane is easy, but I can't seem to find a way to get the highest 2 results.

I use a MAX aggregate function with a GROUP BY to get the MAX, but there's no "TOP N" function as in SQL Server and using ORDER BY... LIMIT only returns the highest N results overall, not per lane.

Since I use a JAVA application I coded myself to query the database and choose what N is, I could do a loop and use a LIMIT and loop through every lane, making a different query each time, but I want to learn how to do it using MySQL.

like image 865
Adam Smith Avatar asked Sep 05 '25 00:09

Adam Smith


1 Answers

This solution is the fastest for MySQL and will work with very large tables, but it uses "funky" MySQL features, so wouldn't be of use for other database flavours.

(Edited to sort before applying logic)

set @count:=-1, @lane:=0; 
select lane, series
from (select lane, series from lane_series order by lane, series desc) x
where if(lane != @lane, @count:=-1, 0) is not null
and if(lane != @lane, @lane:=lane, lane) is not null
and (@count:=@count+1) < 2; -- Specify the number of row at top of each group here

To put this query on steroids, define an index on lane and series: CREATE INDEX lane_series_idx on lane_series(lane, series); and it will do (super fast) index-only scan - so your other text columns don't affect it.

Good points of this query are:

  1. It requires only one table pass (albeit sorted)
  2. It handles ties at any level, for example if there's a tie for 2nd, only one of the 2nd will be displayed - ie the row count is absolute and never exceeded

Here's the test output:

create table lane_series (lane int, series int);

insert into lane_series values (1, 680),(1, 685),(1, 688),(2, 666),(2, 425),(2, 775);

-- Execute above query:

+------+--------+
| lane | series |
+------+--------+
|    1 |    688 |
|    1 |    685 |
|    2 |    775 |
|    2 |    666 |
+------+--------+
like image 78
Bohemian Avatar answered Sep 08 '25 00:09

Bohemian