Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Working with Structs within Arrays for new BigQuery Standard SQL

I'm trying to find rows with duplicate fields in an array of structs within a Google BigQuery table, using the new Standard SQL. The data in the table (simplified) where each row looks a bit like this:

{
  "Session": "abc123",
  "Information" [
    {
      "Identifier": "e8d971a4-ef33-4ea1-8627-f1213e4c67dc"
    },
    {
      "Identifier": "1c62813f-7ec4-4968-b18b-d1eb8f4d9d26"
    },
    {
      "Identifier": "e8d971a4-ef33-4ea1-8627-f1213e4c67dc"
    }
  ]
}

My end goal is to display the rows that have Information entities with duplicate Identifier values present. However, most of the queries I attempt get an error message of the following form:

Cannot access field Identifier on a value with type ARRAY<STRUCT<Identifier STRING>>

Is there a way to work with the data inside of a STRUCT within an ARRAY?


Here's my first attempt at a query:

SELECT
  Session,
  Information
FROM
  `events.myevents`
WHERE
  COUNT(DISTINCT Information.Identifier) != ARRAY_LENGTH(Information.Identifier)
LIMIT
  1000

And another using a subquery:

SELECT
  Session,
  Information
FROM (
  SELECT
    Session,
    Information,
    COUNT(DISTINCT Information.Identifier) AS info_count_distinct,
    ARRAY_LENGTH(Information) AS info_count
  FROM
    `events.myevents`
  WHERE
    COUNT(DISTINCT Information.Identifier) != ARRAY_LENGTH(Information.Identifier)
  LIMIT
    1000)
WHERE
  info_count != info_count_distinct
like image 661
Aaron Avatar asked Oct 24 '25 00:10

Aaron


1 Answers

Try below

SELECT Session, Identifier, COUNT(1) AS dups
FROM `events.myevents`, UNNEST(Information)
GROUP BY Session, Identifier
HAVING dups > 1
ORDER BY Session

Should give you what you expect plus number of dups.
Like below (example)

Session Identifier                              dups     
abc123  e8d971a4-ef33-4ea1-8627-f1213e4c67dc    2    
abc345  1c62813f-7ec4-4968-b18b-d1eb8f4d9d26    3    
like image 58
Mikhail Berlyant Avatar answered Oct 25 '25 15:10

Mikhail Berlyant