Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Bigquery SQL - Is it better to unnest in SELECT or JOIN?

I have a dataset where views are nested inside of sessions and I want a count of views for each session. What is the more efficient/proper way to structure such a query?

Is there any documentation that talks about the preferred way to write queries in BigQuery SQL?

SELECT session_key, ( SELECT COUNT( view_id ) FROM UNNEST( views ) views ) AS view_count 
FROM sessions 
WHERE _PARTITIONTIME >= TIMESTAMP( '2016-04-01' ) ;

SELECT session_key, COUNT( view_id ) AS view_count 
FROM sessions 
  LEFT JOIN UNNEST( views ) views 
WHERE _PARTITIONTIME >= TIMESTAMP( '2016-04-01' ) 
GROUP BY session_key; 

Thank you

like image 674
A.S. Avatar asked Dec 11 '25 04:12

A.S.


2 Answers

Working queries:

a)

SELECT visitId, ( SELECT COUNT( hitNumber ) FROM UNNEST( hits ) ) AS view_count 
FROM `google.com:analytics-bigquery.LondonCycleHelmet.ga_sessions_20130910`

b)

SELECT visitId, COUNT( hitNumber ) AS view_count 
FROM `google.com:analytics-bigquery.LondonCycleHelmet.ga_sessions_20130910`
LEFT JOIN UNNEST( hits )  
GROUP BY visitId

The first query looks shorter and more concise, but let's also look at the explain tab:

a) enter image description here

b) enter image description here

It also looks better at execution time! Probably because the second query has a GROUP BY visitId, that forces BigQuery to look if there are any other sessions with the same id.

But if you are looking for an even more concise option:

SELECT visitId, ARRAY_LENGTH(hits) AS view_count 
FROM `google.com:analytics-bigquery.LondonCycleHelmet.ga_sessions_20130910`
like image 55
Felipe Hoffa Avatar answered Dec 13 '25 23:12

Felipe Hoffa


It is not just about which way better? - it is also about which way reflects your goal? because results of those are different! And you can see this in Felipe's answer - first query returns 63 rows and second query returns 62 rows

So, the first query just returns as many rows as your sessions table has along with count of entries in array filed.
Whereas the second query, in addition to above, groups all rows and aggregates respective counts

Of course, if your table has all visitId unique - this produces the same result
But because this extra grouping - I would expect second query to be more expensive

like image 25
Mikhail Berlyant Avatar answered Dec 13 '25 21:12

Mikhail Berlyant