I just started using BigQuery to explore my company's GA session data. I'm trying to generate a query that will allow me to generate the counts for each experimentId that has been passed to GA.
The experimentId is stored in hits.experiment.experimentId as a nested field. So far I have the following query, but it seems inefficient to have to do the left join to get this info. How can I optimize this query for variables that are nested in another nested array?
SELECT
e.experimentId,
count(*)
FROM
`project-name.IDNUM.ga_sessions_20170527`,
UNNEST(hits) AS hits
LEFT JOIN UNNEST(hits.experiment) as e
GROUP BY e.experimentId
You can use the CROSS JOIN operation again using the unnested hits field:
SELECT
e.experimentId,
count(*)
FROM
`project-name.IDNUM.ga_sessions_20170527`,
UNNEST(hits) AS hits,
UNNEST(hits.experiment) as e
GROUP BY e.experimentId
You can also find some examples in BQ docs on how to process repeated fields (arrays) and a bunch of different techniques that will help you on several types of analyzes you want to run in BQ.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With