Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Snowflake: ARRAY_AGG preserving "NULL"s

ARRAY_AGG function omits NULL by definition.

The requirement is to generate array with the following format:

SELECT [NULL, 'a1', 'a2', 'a3', NULL]
-- [   undefined,   "a1",   "a2",   "a3",   undefined ]

from source data using ARRAY_AGG function and nullifying all entries not starting with letter a:

CREATE OR REPLACE TABLE tab(id INT, val TEXT) AS
SELECT * FROM VALUES  (1,'b0'), (2,'a1'), (3,'a2'), (4,'a3'), (5,'b1');

Attempt 1:

SELECT ARRAY_AGG(CASE WHEN val LIKE 'a%' THEN val END) 
       WITHIN GROUP (ORDER BY id) AS result,
       result = [NULL, 'a1', 'a2', 'a3', NULL] AS is_output_as_requested
FROM  tab;
-- [   "a1",   "a2",   "a3" ]   FALSE

Result: ARRAY_AGG works as intended and skips NULL; The output does not match the requirement.

Attempt 2:

SELECT ARRAY_AGG(CASE WHEN val LIKE 'a%' THEN val ELSE PARSE_JSON('null') END) 
   WITHIN GROUP (ORDER BY id) AS result,
   result = [NULL, 'a1', 'a2', 'a3', NULL] AS is_output_as_requested
FROM tab;
--  [   "a1",   "a2",   "a3" ]   FALSE

Rationale: CASE expression output must have the same data type, therefore implicit conversion: TO_CHAR(PARSE_JSON('null')) <=> NULL

Attempt 3:

SELECT ARRAY_AGG(CASE WHEN val LIKE 'a%' THEN val::VARIANT ELSE PARSE_JSON('null') END) 
      WITHIN GROUP (ORDER BY id)AS result,
      result = [NULL, 'a1', 'a2', 'a3', NULL] AS is_output_as_requested
FROM tab;
--  [   null,   "a1",   "a2",   "a3",   null ]  FALSE

Almost what is requested, though not quite. Type of the first and last element is NULL_VALUE.


I am aware of VARIANT NULL. I am seeking for a way to "preserve NULLs" in exactly the format as requested for ARRAY_AGG.

The solution must be a standalone SQL expression (no Snowflake Scripting/UDFs).

like image 331
Lukasz Szozda Avatar asked Oct 27 '25 00:10

Lukasz Szozda


1 Answers

It is possible to achieve it by aggregating arrays instead of scalar values:

SELECT ARRAY_FLATTEN(ARRAY_AGG(CASE WHEN val LIKE 'a%' THEN [val] ELSE [NULL] END)
       WITHIN GROUP (ORDER BY id)) AS result,
       result = [NULL, 'a1', 'a2', 'a3', NULL] AS is_output_as_requested 
FROM tab;
-- [   undefined,   "a1",   "a2",   "a3",   undefined ] TRUE

Rationale: ARRAY_AGG does not skip [NULL].

Step 1:

SELECT ARRAY_AGG(CASE WHEN val LIKE 'a%' THEN [val] ELSE [NULL] END)
       WITHIN GROUP (ORDER BY id) AS result
FROM tab;
-- [ [undefined], [ "a1"], ["a2"], ["a3"], [undefined] ]

Step 2:

SELECT ARRAY_FLATTEN(...)
-- [ undefined, "a1", "a2", "a3", undefined ]

In general:

SELECT ARRAY_FLATTEN(ARRAY_AGG([some_col]))
FROM tab;
like image 163
Lukasz Szozda Avatar answered Oct 29 '25 13:10

Lukasz Szozda