I have read that it's a bad idea to parse XML/HTML using regular expressions. The alternative suggestion is to use an XML parser. Does one exist in the BigQuery Standard SQL library?
Here is the documentation to how to use Javascript UDFs in BigQuery like Elliot has mentioned.
https://cloud.google.com/bigquery/docs/reference/standard-sql/user-defined-functions
I imagine the UDF might look something like
CREATE TEMPORARY FUNCTION XML(x STRING)
RETURNS STRING
LANGUAGE js AS """
var data = fromXML(x);
return data.title;
"""
OPTIONS(
library="gs://<BUCKET_NAME>/from-xml.min.js"
);
SELECT XML(a) FROM UNNEST(["<title>Title of Page</title>"]) as a
Where from-xml.min.js is from this library and loaded into your gcs account
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With