Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I pair rows together in MYSQL?

I'm working on a simple time tracking app.

I've created a table that logs the IN and OUT times of employees.

Here is an example of how my data currently looks:

E_ID | In_Out |      Date_Time
------------------------------------
  3  |   I    | 2012-08-19 15:41:52
  3  |   O    | 2012-08-19 17:30:22
  1  |   I    | 2012-08-19 18:51:11
  3  |   I    | 2012-08-19 18:55:52
  1  |   O    | 2012-08-19 20:41:52
  3  |   O    | 2012-08-19 21:50:30

Im trying to create a query that will pair the IN and OUT times of an employee into one row like this:

E_ID |       In_Time       |      Out_Time
------------------------------------------------
  3  | 2012-08-19 15:41:52 | 2012-08-19 17:30:22
  3  | 2012-08-19 18:55:52 | 2012-08-19 21:50:30
  1  | 2012-08-19 18:51:11 | 2012-08-19 20:41:52

I hope I'm being clear in what I'm trying to achieve here. Basically I want to generate a report that had both the in and out time merged into one row.

Any help with this would be greatly appreciated. Thanks in advance.

like image 527
patskot Avatar asked Sep 07 '25 04:09

patskot


1 Answers

There are three basic approaches I can think of.

One approach makes use of MySQL user variables, one approach uses a theta JOIN, another uses a subquery in the SELECT list.

theta-JOIN

One approach is to use a theta-JOIN. This approach is a generic SQL approach (no MySQL specific syntax), which can work with multiple RDBMS.

N.B. With a large number of rows, this approach can create a significantly large intermediate result set, which can lead to problematic performance.

SELECT o.e_id, MAX(i.date_time) AS in_time, o.date_time AS out_time    
  FROM e `o`
  LEFT
  JOIN e `i` ON i.e_id = o.e_id AND i.date_time < o.date_time AND i.in_out = 'I'
 WHERE o.in_out = 'O'
 GROUP BY o.e_id, o.date_time
 ORDER BY o.date_time

What this does is match every 'O' row for an employee with every 'I' row that is earlier, and then we use the MAX aggregate to pick out the 'I' record with the closest date time.

This works for perfectly paired data; could produce odd results for imperfect pairs... (two consecutive 'O' records with no intermediate 'I' row, will both get matched to the same 'I' row, etc.)


correlated subquery in SELECT list

Another approach is to use a correlated subquery in the SELECT list. This can have sub-optimal performance, but is sometimes workable (and is occasionally the fastest way to return the specified result set... this approach works best when we have a limited number of rows returned in the outer query.)

 SELECT o.e_id
      , (SELECT MAX(i.date_time)
           FROM e `i`
          WHERE i.in_out = 'I'
            AND i.e_id = o.e_id
            AND i.date_time < o.date_time
        ) AS in_time
      , o.date_time AS out_time
   FROM e `o`
  WHERE o.in_out = 'O'
  ORDER BY o.date_time

User variables

Another approach is to make use of MySQL user variables. (This is a MySQL-specific approach, and is a workaround to the "missing" analytic functions.)

What this query does is order all of the rows by e_id, then by date_time, so we can process them in order. Whenever we encounter an 'O' (out) row, we use the value of date_time from the immediately preceding 'I' row as the 'in_time')

N.B.: This usage of MySQL user variables is dependent on MySQL performing operations in a specific order, a predictable plan. The use of the inline views (or "derived tables", in MySQL parlance) gets us a predictable execution plan. But this behavior is subject to change in future releases of MySQL.

SELECT c.e_id
     , CAST(c.in_time AS DATETIME) AS in_time
     , c.out_time
  FROM (
         SELECT IF(@prev_e_id = d.e_id,@in_time,@in_time:=NULL) AS reset_in_time
              , @in_time := IF(d.in_out = 'I',d.date_time,@in_time) AS in_time
              , IF(d.in_out = 'O',d.date_time,NULL) AS out_time
              , @prev_e_id := d.e_id  AS e_id
           FROM (
                  SELECT e_id, date_time, in_out 
                    FROM e
                    JOIN (SELECT @prev_e_id := NULL, @in_time := NULL) f
                   ORDER BY e_id, date_time, in_out 
                 ) d
       ) c
 WHERE c.out_time IS NOT NULL
 ORDER BY c.out_time

This works for the set of data you have, it needs more thorough testing and tweaking to ensure you get the result set you want with quirky data, when the rows are not perfectly paired (e.g. two 'O' rows with no 'I' row between them, an 'I' row with no subsequent 'O' row, etc.)

SQL Fiddle

like image 168
spencer7593 Avatar answered Sep 10 '25 02:09

spencer7593