Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

BigQuery argmax: Is array order maintained when doing CROSS JOIN UNNEST

Question:

In BigQuery, standard SQL, if I run

SELECT *
FROM mytable
CROSS JOIN UNNEST(mytable.array)

Can I be certain that the resulting row order is the same as the array order?

Example:

Let's say I have the following table mytable:

Row | id   | prediction
1   | abcd | [0.2, 0.5, 0.3]

If I run SELECT * FROM mytable CROSS JOIN UNNEST(mytable.prediction), can I be certain that the row order is the same as the array order? I.e. will the resulting table always be:

Row | id   | unnested_prediction
1   | abcd | 0.2
2   | abcd | 0.5
3   | abcd | 0.3

More background on use case (argmax):

I'm trying to find the array index with the largest value for the array in each row (argmax), i.e. the second element (0.5) in the array above. My target output is thus something like this:

Row | id   | argmax
1   | abcd | 2

Using CROSS JOIN, a DENSE_RANK window function ordered by the prediction value and a ROW_NUMBER window function to find the argmax, I am able to make this work with some test data. You can verify with this query:

WITH predictions AS (
  SELECT 'abcd' AS id, [0.2, 0.5, 0.3] AS prediction
  UNION ALL
  SELECT 'efgh' AS id, [0.7, 0.2, 0.1] AS prediction
),
ranked_predictions AS (
  SELECT 
    id,
    ROW_NUMBER() OVER (PARTITION BY id) AS rownum, -- This is the ordering I'm curious about
    DENSE_RANK() OVER (PARTITION BY id ORDER BY flattened_prediction DESC) AS array_rank
  FROM
     predictions P
  CROSS JOIN
    UNNEST(P.prediction) AS flattened_prediction
)
SELECT
  id,
  rownum AS argmax
FROM
  ranked_predictions
WHERE array_rank = 1

It could just be a coincidence that ROW_NUMBER behaves well in my tests (i.e. that it is ordered according to the unnested array), so it would be nice to be certain.

like image 740
dlebech Avatar asked Dec 07 '25 11:12

dlebech


1 Answers

Can I be certain that the resulting row order is the same as the array order?

you should use WITH OFFSET to get position of the elements in the array, so then you can use them for ordering in your further logic

#standardSQL
WITH `project.dataset.table` AS (
  SELECT 'abcd' id, [0.2, 0.5, 0.3] prediction
)
SELECT id, unnested_prediction
FROM `project.dataset.table`, 
UNNEST(prediction) unnested_prediction WITH OFFSET pos
ORDER BY id, pos  
like image 109
Mikhail Berlyant Avatar answered Dec 10 '25 02:12

Mikhail Berlyant



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!