Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extracting JSON objects in Athena / Presto for dynamic keys

I have two string formatted columns in my table. One of the columns has json objects in it. The keys in the JSON objects are not fixed - so the problem is I cant use standard json extract functions. Here is how the table looks

timestamp upstream_tables
2023-02-02T17:34:55.946Z {"ETL_table_1":true,"ETL_table_2":true}
2023-02-02T13:30:11.882Z {"ETL_table_3":true}
2023-02-02T20:12:22.116Z {"ETL_table_4":true,"ETL_table_5":true,"ETL_table_6":false}

I want to flatten this table to something like below

timestamp upstream_tablename was_completed
2023-02-02T17:34:55.946Z ETL_table_1 TRUE
2023-02-02T17:34:55.946Z ETL_table_2 TRUE
2023-02-02T13:30:11.882Z ETL_table_3 TRUE
2023-02-02T20:12:22.116Z ETL_table_4 TRUE
2023-02-02T20:12:22.116Z ETL_table_5 TRUE
2023-02-02T20:12:22.116Z ETL_table_6 FALSE

Can anyone please help? Have spent a lot of time using map_values and map_keys but couldnt get it right.

The only closest thing I could come up with was this

select
   timestamp,
    t.*
FROM mytable
   CROSS JOIN UNNEST(map_entries(CAST(json_extract(upstream_tables, '$') AS MAP(VARCHAR, VARCHAR)))) AS t
like image 685
chuzzle Avatar asked Jan 19 '26 00:01

chuzzle


1 Answers

You can do this with a combination of UNNEST and json_query.

First, use json_query to convert all the field-value pairs to a normalized form with the shape {"name": "xxx", value: yyy}. Then, convert these to an array of rows and unnest them into individual rows.

WITH data(ts, value) AS (
    VALUES
        (from_iso8601_timestamp('2023-02-02T17:34:55.946Z'), VARCHAR '{"ETL_table_1":true,"ETL_table_2":true}'),
        (from_iso8601_timestamp('2023-02-02T13:30:11.882Z'), VARCHAR '{"ETL_table_3":true}'),
        (from_iso8601_timestamp('2023-02-02T20:12:22.116Z'), VARCHAR '{"ETL_table_4":true,"ETL_table_5":true,"ETL_table_6":false}')

)
SELECT ts, t.name, t.value
FROM data, UNNEST(CAST(json_parse(json_query(value, 'strict $.keyvalue()' WITH ARRAY WRAPPER)) AS array(row(name varchar, value boolean)))) t(name, value)

Update (2024-01-12)

In recent versions of Trino, you can use json_table() to extract the values directly, without the need to parse the json data manually or cast it:

WITH data(ts, value) AS (
    VALUES
        (from_iso8601_timestamp('2023-02-02T17:34:55.946Z'), VARCHAR '{"ETL_table_1":true,"ETL_table_2":true}'),
        (from_iso8601_timestamp('2023-02-02T13:30:11.882Z'), VARCHAR '{"ETL_table_3":true}'),
        (from_iso8601_timestamp('2023-02-02T20:12:22.116Z'), VARCHAR '{"ETL_table_4":true,"ETL_table_5":true,"ETL_table_6":false}')

)
SELECT data.ts, t.name, t.value
FROM data,
   json_table(
       data.value,
       'strict $.keyvalue()'
       COLUMNS (
           name VARCHAR PATH 'strict $.name',
           value BOOLEAN PATH 'strict $.value'
       )
   ) AS t
like image 98
Martin Traverso Avatar answered Jan 20 '26 17:01

Martin Traverso