Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Combining OBJECT_AGG and GROUP BY in Snowflake

Is there a way to combine an GROUP BY and OBJECT_AGG in snowflake with different levels of aggregation between the two?

In the following example, we wish to return, for each city, an object mapping cuisine to the highest michelin stars of that cuisine in the city:

City Restaurant Cuisine Michelin Stars
San Fransisco Quince Californian 3
San Fransisco Coi Californian 2
San Fransisco Mister Jius Chinese 1
London Le Gavroche French 2
London La Dame de Pic French 2
London Restaurant Gordon Ramsay French 3

We want to produce the following:

City Cuisine to Top Rating
San Fransisco {'Californian': 3, 'Chinese': 1}
London {'French': 3}

My initial approach is:

SELECT
city,
OBJECT_AGG(cuisine, MAX(michelin_stars)::variant) over (partition by cuisine)
FROM
top_restaurants
GROUP BY city

This returns the error cuisine is not a valid group by expression.

This question is similar to

  • Snowflake: "SQL compilation error:... is not a valid group by expression"

From the above, I believe the cause is the GROUP BY is calculated before the PARTITION BY and breaks when the GROUP BY drops cuisine as it attempts to aggregate.

References:

https://docs.snowflake.com/en/sql-reference/functions/object_agg.html

like image 647
rmstmppr Avatar asked Sep 12 '25 21:09

rmstmppr


2 Answers

Thanks to the order of operation, you can still do it in one select. You just have to aggregate by city and cuisine first. When it's time for window function to shine, you partition by city. Obviously this leads to duplicates because window function simply applies calculations to the result set left by group by without collapsing any rows. Add a distinct and you're good to go

select distinct 
       city,
       object_agg(cuisine, max(michelin_stars)::variant) over (partition by city)
from t
group by city, cuisine;
like image 82
Rajat Avatar answered Sep 15 '25 12:09

Rajat


I believe the cause is the GROUP BY is calculated before the PARTITION BY and breaks when the GROUP BY drops cuisine as it attempts to aggregate.

This is exactly what happened.

Related: A Beginner’s Guide to the True Order of SQL Operations


Alternative approach:

WITH cte AS (
   SELECT *
   FROM top_restaurants
   QUALIFY ROW_NUMBER() OVER(PARTITION BY city,cuisine 
                             ORDER BY michelin_stars DESC) = 1
)
SELECT city,
       OBJECT_AGG(cuisine, michelin_stars::variant)
FROM cte
GROUP BY city
like image 30
Lukasz Szozda Avatar answered Sep 15 '25 14:09

Lukasz Szozda