I have a column in BigQuery that contains a URL querystring, eg a=1&c=1. I'd like to reference some of these in my query, eg. filtering with something like WHERE querystring.c = 1.
My plan is to convert the querystring to JSON, and then use JSON_EXTRACT. I figured I could write a UDF to convert the querystring to JSON, but I can't work out anyway to import the node.js querystring package into my UDF to make this easy.
Is it possible to import a node.js core library into a UDF, and if so, how? Alternatively is there a better way to achieve what I'm trying to do?
Alternatively is there a better way to achieve what I'm trying to do?
I think - Yes - using JS UDF is expensive resource wise and has some limitations. Using SQL UDF is less expensive and if you want you can transform below by using SQL UDF - but at least below gives you idea of "alternative" approach
For BigQuery Standard SQL
#standardSQL
WITH yourTable AS (
SELECT 1 AS id, 'a=1&c=1' AS querystring UNION ALL
SELECT 2, 'c=2&b=3'
)
SELECT
id,
querystring,
SPLIT(kv, '=')[SAFE_OFFSET(0)] AS key,
SPLIT(kv, '=')[SAFE_OFFSET(1)] AS value
FROM yourTable, UNNEST(SPLIT(querystring, '&')) AS kv
above allows you to "extract" all key-value pairs like below
id querystring key value
2 c=2&b=3 b 3
1 a=1&c=1 c 1
1 a=1&c=1 a 1
2 c=2&b=3 c 2
So now you can use them in WHERE clause like below
#standardSQL
WITH yourTable AS (
SELECT 1 AS id, 'a=1&c=1' AS querystring UNION ALL
SELECT 2, 'c=2&b=3'
)
SELECT
id,
querystring,
FROM yourTable, UNNEST(SPLIT(querystring, '&')) AS kv
WHERE SPLIT(kv, '=')[SAFE_OFFSET(0)] = 'c'
AND SPLIT(kv, '=')[SAFE_OFFSET(1)] = '1'
this gives below result
id querystring
1 a=1&c=1
Note: this is just quick and abstract illustration of approach - I hope you will adjust/adopt it to your specific case
Below is example of transforming above to use SQL UDF
#standardSQL
CREATE TEMPORARY FUNCTION parse(qs STRING, key STRING) AS (
(SELECT SPLIT(kv, '=')[SAFE_OFFSET(1)] FROM UNNEST(SPLIT(qs, '&')) AS kv WHERE SPLIT(kv, '=')[SAFE_OFFSET(0)] = key )
);
WITH yourTable AS (
SELECT 1 AS id, 'a=1&c=1' AS querystring UNION ALL
SELECT 2, 'c=2&b=3'
)
SELECT
id,
querystring
FROM yourTable
WHERE parse(querystring, 'c') = '1'
Note: usually querystring do not have keys duplication - so case of having dups is not addressed - but easy to if needed :o)
but it doesn't decode any encoded components, so my values will still contain things like %20. Any suggestion on that?
#standardSQL
CREATE TEMPORARY FUNCTION parse(qs STRING, key STRING) AS (
(SELECT SPLIT(kv, '=')[SAFE_OFFSET(1)] FROM UNNEST(SPLIT(qs, '&')) AS kv WHERE SPLIT(kv, '=')[SAFE_OFFSET(0)] = key )
);
CREATE TEMP FUNCTION decode(str STRING)
RETURNS STRING
LANGUAGE js AS """
if (str == null) return null;
try {
return decodeURIComponent(str);
} catch (e) {
return str;
}
""";
WITH yourTable AS (
SELECT 1 AS id, 'a=1&c=1&d=a%20b%20c' AS querystring UNION ALL
SELECT 2, 'c=2&b=3'
)
SELECT
id,
querystring,
decode(parse(querystring, 'd')) as d
FROM yourTable
WHERE parse(querystring, 'c') = '1'
result is
id querystring d
-- ------------------- -----
1 a=1&c=1&d=a%20b%20c a b c
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With