I have a table of events of users, and I want to project those events into a new column with some predicate, and then aggregate the events together per user into a new projection that tells me if a user has ever had the predicate match for them, or if they've never had it match, etc.
In other languages this is usually called all() and any(), where you pass
it a list of boolean values and it will tell you if all of them match, or if at
least one matches. It's equivalent to using a boolean AND on all boolean
values (such as in the case with all) or using a boolean OR on all boolean
values (as in any).
Does BigQuery have this feature? I can sort of approximate it using max and
min but it's not ideal.
Example:
select
month(date_time) m,
count(*) as ct,
max(id_is_present),
min(id_is_present),
max(starts_with_one) max_one,
min(starts_with_one) min_one,
from
(
select
length(user_id) > 1 id_is_present,
regexp_match(user_id, r'^1') starts_with_one,
date_time
from
[user_events.2015_02]
)
group by
m
It's exploiting a behavior of max(true, false, false) yielding true, so you could sort of implement any and all by searching through the column for values and then building from there.
Is this the hack I have to rely on or does BigQuery support boolean aggregates?
In case someone else stumbles across this, standard SQL offers logical_and() and logical_or. So, the code could be written as:
select month(date_time) as m, count(*) as ct,
logical_or(id_is_present),
logical_and(id_is_present),
logical_or(starts_with_one) as max_one,
logical_and(starts_with_one) min_one,
from (select length(user_id) > 1 id_is_present,
regexp_match(user_id, r'^1') starts_with_one,
date_time
from [user_events.2015_02]
) u
group by m;
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With