I have an analytical function challenge on Bigquery that is messing with my mind. Sorry if I am missing any fundamental function here, but I couldn't find it.
Anyway I think this can lead to a good discussion.
I would like to get a rank among groups (dense_rank or row_number or something like that), but on a clustered fashion, which is the tricky bit.
For example, I would like to create clusters based not only on two partition columns (see image below) but based on the order among them as well. This is why I am calling it cluster. Each cluster should have the same rank if it's adjacent, but a different number if it is not (it was split by other cluster).
So, for cluster "a, x", all rows for the first cluster have number 1, then all rows for the second cluster have number 2, and so on.
How can I achieve this? Is there an analytical function out of the box for this or does this require a bit of auxiliary columns?
Thanks in advance.
Consider below approach
select partition_col_1, partition_col_2, order_col,
dense_rank() over(partition by partition_col_1, partition_col_2 order by group_number) output
from (
select *, countif(group_start) over(order by order_col) group_number
from (
select *,
ifnull((partition_col_1, partition_col_2) != lag((partition_col_1, partition_col_2)) over(order by order_col), true) group_start
from `project.dataset.table`
)
)
# order by order_col
if applied to sample data in your question - output is
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With