I need to group a sequence of rows that have the same value in one determinate field. For instance, I have a series of Selina Kyle's records between two Bruce Wayne's records. I need to group those records by username, but as long as they are in a immediate sequence. For instance, I have this table:
|User | Time |Date_In |Date_Out |
|Bruce Wayne | 2793 |2017-08-30 09:55:52 |2017-08-30 10:42:25 |
|Selina Kyle | 2430 |2017-08-30 10:42:25 |2017-08-30 11:22:55 |
|Selina Kyle | 4461 |2017-08-30 11:22:55 |2017-08-30 12:37:16 |
|Selina Kyle | 4356 |2017-08-30 12:37:16 |2017-08-30 13:49:52 |
|Selina Kyle | 2295 |2017-08-30 13:49:52 |2017-08-30 14:28:07 |
|Bruce Wayne | 2098 |2017-08-30 14:28:07 |2017-08-30 15:03:05 |
I need to group it by username and sum time, but I need the Bruce Wayne records separetely, because they are not in a immediate sequence:
|User |Time |Date_In |Date_Out |
|Bruce Wayne |2793 |2017-08-30 09:55:52 |2017-08-30 10:42:25|
|Selina Kyle |13542 |2017-08-30 10:42:25 |2017-08-30 14:28:07|
|Bruce Wayne |2098 |2017-08-30 14:28:07 |2017-08-30 15:03:05|
try below for BigQuery Standard SQL
#standardSQL
SELECT MIN(User) AS User, SUM(TIME) AS TIME, MIN(Date_In) AS Date_In, MAX(Date_Out) AS Date_Out
FROM (
SELECT *,
COUNTIF(User != IFNULL(prev_User, User)) OVER(ORDER BY Date_In) AS groupid
FROM (
SELECT *,
LAG(User) OVER(ORDER BY Date_In) AS prev_User
FROM `yourTable`
ORDER BY Date_In
)
)
GROUP BY groupid
-- ORDER BY Date_In
You can play/test this with dummy data from your question as in below
#standardSQL
WITH `yourTable` AS (
SELECT 'Bruce Wayne'AS User, 2793 AS TIME, '2017-08-30 09:55:52' AS Date_In, '2017-08-30 10:42:25' AS Date_Out UNION ALL
SELECT 'Selina Kyle', 2430, '2017-08-30 10:42:25', '2017-08-30 11:22:55' UNION ALL
SELECT 'Selina Kyle', 4461, '2017-08-30 11:22:55', '2017-08-30 12:37:16' UNION ALL
SELECT 'Selina Kyle', 4356, '2017-08-30 12:37:16', '2017-08-30 13:49:52' UNION ALL
SELECT 'Selina Kyle', 2295, '2017-08-30 13:49:52', '2017-08-30 14:28:07' UNION ALL
SELECT 'Bruce Wayne', 2098, '2017-08-30 14:28:07', '2017-08-30 15:03:05'
)
SELECT MIN(User) AS User, SUM(TIME) AS TIME, MIN(Date_In) AS Date_In, MAX(Date_Out) AS Date_Out
FROM (
SELECT *,
COUNTIF(User != IFNULL(prev_User, User)) OVER(ORDER BY Date_In) AS groupid
FROM (
SELECT *,
LAG(User) OVER(ORDER BY Date_In) AS prev_User
FROM `yourTable`
ORDER BY Date_In
)
)
GROUP BY groupid
ORDER BY Date_In
Please note - from your example it looks like you have case when there are no overlapping of date_out and date_in in consecutive rows - if you do have - above query needs to be further adjusted to reflect logic of how to treat such case
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With