Let say I have the following table :
AccountID Email status_update date (dd/mm/yyyy)
123456 [email protected] state1 02/02/2016
123456 [email protected] state2 10/010/2018
456123 [email protected] state2 05/04/2017
789123 [email protected] state2 22/04/2016
789123 [email protected] state1 17/06/2018
456345 [email protected] state1 13/08/2017
456345 [email protected] state2 09/07/2015
456345 [email protected] state2 09/07/2014
here is the output I want :
UniqueID Email Most_recent_status_Update CountUniqueID
123456 [email protected] state2 2
456123 [email protected] state2 1
789123 [email protected] state1 2
456345 [email protected] state1 3
So basically I want to be able to group by Email, to do a unique(AccountID) and to select status_update based in MIN(date)
The difficult part to me is being able to select status_update based in MIN(date). I would like to be able to do something like this :
Select status_update when date = min(date)
I'm thinking about using CASE formula in order to have something like this :
Select CASE (WHEN date = min(date) Then status_update else null END
I always end up with all kind of error. I'm not able to build the whole query.
thanks
See example below - for BigQuery Standard SQL
#standardSQL
WITH `project.dataset.table` AS (
SELECT 123456 accountID, '[email protected]' email, 'state1' status_update, '02/02/2016' dt UNION ALL
SELECT 123456, '[email protected]', 'state2', '10/10/2018' UNION ALL
SELECT 456123, '[email protected]', 'state2', '05/04/2017' UNION ALL
SELECT 789123, '[email protected]', 'state2', '22/04/2016' UNION ALL
SELECT 789123, '[email protected]', 'state1', '17/06/2018' UNION ALL
SELECT 456345, '[email protected]', 'state1', '13/08/2017' UNION ALL
SELECT 456345, '[email protected]', 'state2', '09/07/2015' UNION ALL
SELECT 456345, '[email protected]', 'state2', '09/07/2014'
)
SELECT
accountID,
email,
ARRAY_AGG(status_update ORDER BY PARSE_DATE('%d/%m/%Y', dt) DESC LIMIT 1)[OFFSET (0)] most_recent_status_update,
COUNT(1) AS cnt
FROM `project.dataset.table`
GROUP BY accountID, email
with result as
Row accountID email most_recent_status_update cnt
1 456123 [email protected] state2 1
2 123456 [email protected] state2 2
3 789123 [email protected] state1 2
4 456345 [email protected] state1 3
for your real use case you should use something like below
#standardSQL
SELECT
accountID,
email,
ARRAY_AGG(status_update ORDER BY PARSE_DATE('%d/%m/%Y', dt) DESC LIMIT 1)[OFFSET (0)] most_recent_status_update,
COUNT(1) AS cnt
FROM `project.dataset.table`
GROUP BY accountID, email
Note: this example assumes specific schema / data types (based on your example in question) - so if your real datatypes are different - you need to adjust things a little :o)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With