Grouping to find min,max for each group

Question

This would be relatively easy if I only cared about a single min and max for each group, the problem is my requirement is to find the various boundaries. An example data set is as follows:

BoundaryColumn  GroupIdentifier
1                  A
3                  A
4                  A
7                  A
8                  B
9                  B  
11                 B  
13                 A
14                 A
15                 A
16                 A

What I need from the sql is a result set as follows:

min  max  groupid
1    7    A
8    11   B
13   16   A

Essentially finding the boundaries for each cluster of the groups.

The data would be stored in either oracle11g or mysql so syntax can be provided for either platform.

Ed Gibbs · Accepted Answer

A disclaimer: It's a lot easier to query partial results and process something like this with a front-end language. That said...

The following query works for Oracle (which supports analytic queries) but not for MySQL (which does not). There's a SQL Fiddle here.

WITH BoundX AS (
  SELECT * FROM (
    SELECT
     BoundaryColumn,
     GroupIdentifier,
     LAG(GroupIdentifier) OVER (ORDER BY BoundaryColumn) AS GIDLag,
     LEAD(GroupIdentifier) OVER (ORDER BY BoundaryColumn) AS GIDLead
    FROM MyTable
    ORDER BY BoundaryColumn
  )
  WHERE GIDLag IS NULL OR GroupIdentifier <> GIDLag
     OR GIDLead IS NULL OR GroupIdentifier <> GIDLead
)
SELECT MIN, MAX, GROUPID
FROM (
  SELECT
    BoundaryColumn AS MIN,
    LEAD(BoundaryColumn) OVER (ORDER BY BoundaryColumn) AS MAX,
    GroupIdentifier AS GROUPID,
    GIDLag,
    GIDLead
  FROM BoundX
)
WHERE GROUPID = GIDLead

Here's the logic, step by step. You may be able to improve on this, because I get the feeling there's one subquery too many here...

This query pulls the prior and following GroupIdentifier values into each row:

SELECT
 BoundaryColumn,
 GroupIdentifier,
 LAG(GroupIdentifier) OVER (ORDER BY BoundaryColumn) AS GIDLag,
 LEAD(GroupIdentifier) OVER (ORDER BY BoundaryColumn) AS GIDLead
FROM MyTable
ORDER BY BoundaryColumn

The result looks like this:

BoundaryColumn  GroupIdentifier  GIDLag  GIDLead
1                  A                         A
3                  A                A        A
4                  A                A        A
7                  A                A        B
8                  B                A        B
9                  B                B        B
11                 B                B        A
13                 A                B        A
14                 A                A        A
15                 A                A        A
16                 A                A

If you add logic to get rid of all the rows where GIDLag = GIDLead = GroupIdentifier, you'll end up with the boundaries:

WITH BoundX AS (
  SELECT * FROM (
    SELECT
     BoundaryColumn,
     GroupIdentifier,
     LAG(GroupIdentifier) OVER (ORDER BY BoundaryColumn) AS GIDLag,
     LEAD(GroupIdentifier) OVER (ORDER BY BoundaryColumn) AS GIDLead
    FROM MyTable
    ORDER BY BoundaryColumn
  )
  WHERE GIDLag IS NULL OR GroupIdentifier <> GIDLag
     OR GIDLead IS NULL OR GroupIdentifier <> GIDLead
)
SELECT
  BoundaryColumn AS MIN,
  LEAD(BoundaryColumn) OVER (ORDER BY BoundaryColumn) AS MAX,
  GroupIdentifier AS GROUPID,
  GIDLag,
  GIDLead
FROM BoundX

With this addition the results are:

MIN MAX GROUPID GIDLAG GIDLEAD
--- --- ------- ------ -------
  1   7 A              A
  7   8 A       A      B
  8  11 B       A      B
 11  13 B       B      A
 13  16 A       B      A
 16     A       A

Finally, include only those rows where GroupID = GIDLead. That's the query at the top of this answer. The results are:

MIN MAX GROUPID
--- --- -------
  1   7 A
  8  11 B
 13  16 A

Klazen108 · Answer

Take a look at this site regarding "runs" of data: http://www.sqlteam.com/article/detecting-runs-or-streaks-in-your-data

Armed with the knowledge provided in that link, you could write a query like this:

SELECT BoundaryColumn,
GroupIdentifier,
(
SELECT COUNT(*)
FROM Table T
WHERE T.GroupIdentifier <> TR.GroupIdentifier
AND T.BoundaryColumn <= TR.BoundaryColumn
) as RunGroup
FROM Table TR

Using this information, you could then group by "RunGroup", and select the GroupIdentifier and min/max BoundaryColumn.

EDIT: I've felt the peer pressure, here's an SQLFiddle with my version of the answer: http://www.sqlfiddle.com/#!8/9a24c/4/0

Grouping to find min,max for each group

Tags:

sql

mysql

oracle11g

user2887330

2 Answers

Ed Gibbs

Klazen108

Recent Activity

Donate For Us

Grouping to find min,max for each group

Tags:

sql

mysql

oracle11g

user2887330

2 Answers

Ed Gibbs

Klazen108

Related questions

Recent Activity

Donate For Us