I'm investigating porting some bigquery legacy sql containing windowed distinct counts like this
count(distinct brand_id) over (partition by user_id order by order_placed_at range between 7 * 24 * 60 * 60 * 1000000 PRECEDING AND 1 PRECEDING) as last_7_day_buyer_brands
to Standard sql.... but I get this error....
Window ORDER BY is not allowed if DISTINCT is specified
For reference I've tried APPROX_COUNT_DISTINCT function with no luck.
Is there a better way to get this to work other than write the subqueries and group by's?
Most of the other queries have ported to standard sql with only minor changes.
Per documentation
OVER clause requirements:
PARTITION BY: Optional.
ORDER BY: Optional. Disallowed if DISTINCT is present.
window_frame_clause: Optional. Disallowed if DISTINCT is present.
note: above is 'highlighted' by me, not as in documentation
As you can see not only ORDER BY but even RANGE BETWEEN is not allowed when DISTINCT is used
I think, subquery is the way to go.
In case if you need direction for this, use below simple example
#standardSQL
SELECT
user_id,
order_placed_at,
brand_id,
(SELECT COUNT(DISTINCT brand)
FROM UNNEST(last_7_day_buyer_brands_with_dups) AS brand
) AS last_7_day_buyer_brands
FROM (
SELECT
user_id,
order_placed_at,
brand_id,
ARRAY_AGG(brand_id) OVER(
PARTITION BY user_id ORDER BY order_placed_at
RANGE BETWEEN 7 * 24 * 60 * 60 * 1000000 PRECEDING AND 1 PRECEDING
) AS last_7_day_buyer_brands_with_dups
FROM yourTable
)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With