An example of this usage is the solution posted here here. The code looks like this (BigQuery, StandardSQL):
SELECT ANY_VALUE(e).*, MAX(SentDateTime) SentDateTime
FROM `project.dataset.table2` e
JOIN `project.dataset.table1` s
ON e.EmailName = s.EmailName
AND EventDateTime > SentDateTime
GROUP BY FORMAT('%t', e)
From ANY_VALUE() documentation, I think I understand what the function does (return any value from the given set). I don't understand what it does in this context though.
When I leave out the GROUP BY line, it will return some row from the table. That's straightforward. When the GROUP BY is added, it returns ALL rows from the table and that doesn't make sense to me. Can someone explain?
I interpret it as a way of grouping by all the columns without listing them explicitly somehow, but when I read the answers to this post, it's a bit worrying to read "If you are depending on a particular value being returned, it is at your risk of it not working at some point."
let's say you are grouping by the hash of a values (in this case - FORMAT('%t', e
) where e is the whole row of the table) - so now , when you need to retrieve such a row itself - you are using ANY_VALUE(e)
- obviously all values in each group are the same - so any value will always return same value from the group. and finally - you use .*
to get all fields from that row. That simple! Hope now it is clear for you!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With