Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

create id column based on activity data

I have a table EVENTS

USER  EVENT_TS             EVENT_TYPE
abc   2016-01-01 08:00:00  Login
abc   2016-01-01 08:25:00  Stuff
abc   2016-01-01 10:00:00  Stuff
abc   2016-01-01 14:00:00  Login
xyz   2015-12-31 18:00:00  Login
xyz   2016-01-01 08:00:00  Logout

What I need to do is produce a session field for each period of activity for each user. In addition, if the user has been idle for a period equal to or longer than p_timeout (1 hour in this case) then a new session starts at the next activity. Users don't always log out cleanly, so the logout isn't walways there...

Notes:

Logout always terminates a session
There doesn't have to be a logout or a login (because software)
Login is always a new session

Output like

USER  EVENT_TS             EVENT_TYPE  SESSION
abc   2016-01-01 08:00:00  Login       1
abc   2016-01-01 08:25:00  Stuff       1
abc   2016-01-01 10:00:00  Stuff       2
abc   2016-01-01 14:00:00  Login       3
xyz   2015-12-31 18:00:00  Login       1
xyz   2016-01-01 08:00:00  Logout      1

Any thoughts on how to acheive this?

like image 631
JohnHC Avatar asked Mar 24 '26 15:03

JohnHC


2 Answers

I think this may do what you need. I changed "user" to "usr" in the input, and "session" to "sess" in the output - I don't ever use reserved Oracle words for object names.

Note: as Boneist pointed out below, my solution will assign a session number of 0 to the first session, if it is a Logout event (or a succession of Logouts right at the top). If this situation can occur in the data, and if the desired behavior is to start session counts at 1 even in that case, then the definition of flag must be tweaked - for example, by making flag = 1 when lag(event_ts) over (partition by usr order by event_ts) is null as well.

Good luck!

with
     events ( usr, event_ts, event_type ) as (
       select 'abc', to_timestamp('2016-01-01 08:00:00', 'yyyy-mm-dd hh24:mi:ss'), 'Login' from dual union all
       select 'abc', to_timestamp('2016-01-01 08:25:00', 'yyyy-mm-dd hh24:mi:ss'), 'Stuff' from dual union all
       select 'abc', to_timestamp('2016-01-01 10:00:00', 'yyyy-mm-dd hh24:mi:ss'), 'Stuff' from dual union all
       select 'abc', to_timestamp('2016-01-01 14:00:00', 'yyyy-mm-dd hh24:mi:ss'), 'Login' from dual union all
       select 'xyz', to_timestamp('2015-12-31 18:00:00', 'yyyy-mm-dd hh24:mi:ss'), 'Login' from dual union all
       select 'xyz', to_timestamp('2016-01-01 08:00:00', 'yyyy-mm-dd hh24:mi:ss'), 'Logout' from dual
     ),
     start_of_sess ( usr, event_ts, event_type, flag ) as (
       select usr, event_ts, event_type,
              case when event_type != 'Logout' 
                    and
                        (    event_ts >= lag(event_ts) over (partition by usr 
                                                             order by event_ts) + 1/24 
                          or event_type = 'Login'
                          or lag(event_type) over (partition by usr 
                                                   order by event_ts) = 'Logout'
                        )
                   then 1 end
       from   events
     )
select usr, event_ts, event_type,
       count(flag) over (partition by usr order by event_ts) as sess
from   start_of_sess
; 

Output (timestamps use my current NLS_TIMESTAMP_FORMAT setting):

USR EVENT_TS                          EVENT_TYPE   SESS
--- --------------------------------- ---------- ------
abc 01-JAN-2016 08.00.00.000000000 AM Login           1
abc 01-JAN-2016 08.25.00.000000000 AM Stuff           1
abc 01-JAN-2016 10.00.00.000000000 AM Stuff           2
abc 01-JAN-2016 02.00.00.000000000 PM Login           3
xyz 31-DEC-2015 06.00.00.000000000 PM Login           1
xyz 01-JAN-2016 08.00.00.000000000 AM Logout          1

 6 rows selected     

I think this will do the trick:

WITH EVENTS AS (SELECT 'abc' usr, to_date('2016-01-01 08:00:00', 'yyyy-mm-dd hh24:mi:ss') event_ts, 'login' event_type FROM dual UNION ALL
                SELECT 'abc' usr, to_date('2016-01-01 08:25:00', 'yyyy-mm-dd hh24:mi:ss') event_ts, 'Stuff' event_type FROM dual UNION ALL
                SELECT 'abc' usr, to_date('2016-01-01 10:00:00', 'yyyy-mm-dd hh24:mi:ss') event_ts, 'Stuff' event_type FROM dual UNION ALL
                SELECT 'abc' usr, to_date('2016-01-01 14:00:00', 'yyyy-mm-dd hh24:mi:ss') event_ts, 'login' event_type FROM dual UNION ALL
                SELECT 'xyz' usr, to_date('2015-12-31 18:00:00', 'yyyy-mm-dd hh24:mi:ss') event_ts, 'login' event_type FROM dual UNION ALL
                SELECT 'xyz' usr, to_date('2016-01-01 08:00:00', 'yyyy-mm-dd hh24:mi:ss') event_ts, 'Logout' event_type FROM dual UNION ALL
                SELECT 'def' usr, to_date('2016-01-01 08:00:00', 'yyyy-mm-dd hh24:mi:ss') event_ts, 'Logout' event_type FROM dual UNION ALL
                SELECT 'def' usr, to_date('2016-01-01 08:15:00', 'yyyy-mm-dd hh24:mi:ss') event_ts, 'Logout' event_type FROM dual)
SELECT usr,
       event_ts,
       event_type,
       SUM(counter) OVER (PARTITION BY usr ORDER BY event_ts) session_id
FROM   (SELECT usr,
               event_ts,
               event_type,
               CASE WHEN LAG(event_type, 1, 'Logout') OVER (PARTITION BY usr ORDER BY event_ts) = 'Logout' THEN 1
                    WHEN event_type = 'Logout' THEN 0
                    WHEN event_ts - LAG(event_ts) OVER (PARTITION BY usr ORDER BY event_ts) > 1/24 THEN 1
                    WHEN event_type = 'login' THEN 1
                    ELSE 0
               END counter
        FROM   EVENTS);

USR EVENT_TS            EVENT_TYPE SESSION_ID
--- ------------------- ---------- ----------
abc 2016-01-01 08:00:00 login               1
abc 2016-01-01 08:25:00 Stuff               1
abc 2016-01-01 10:00:00 Stuff               2
abc 2016-01-01 14:00:00 login               3
def 2016-01-01 08:00:00 Logout              1
def 2016-01-01 08:15:00 Logout              2
xyz 2015-12-31 18:00:00 login               1
xyz 2016-01-01 08:00:00 Logout              1

This solution relies on the logic-short circuiting that takes place in the CASE expression and the fact that the event_type is not null. It also assumes that multiple logouts in a row are counted as separate sessions:

  1. If the previous row was a logout row (and if there is no previous row - i.e. for the first row in the set - treat it as if a logout row was present), we want to increase the counter by one. (Logouts terminate the session, so we always have a new session following a logout.)
  2. If the current row is a logout, then this terminates the existing session. Therefore, the counter shouldn't be increased.
  3. If the time of the current row is greater than an hour from the previous row, increase the counter by one.
  4. If the current row is a login row, then it's a new session, so increase the counter by one.
  5. For any other case, we don't increase the counter.

Once we've done that, it's just a matter of doing a running total on the counter.

like image 36
Boneist Avatar answered Mar 26 '26 05:03

Boneist



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!