I am wondering whether a recursive UDF function in BigQuery is the right solution to what i'm doing. But first, is it possible to run a query from inside the UDF?
I see a similar question here: BigQuery : is it possible to execute another query inside an UDF? but the solution seems to be a workaround that executes straight SQL. In my case, I might have to call the UDF repeatedly/recursively without knowing in advance the number of steps (say 3-7 steps).
It's a simple use-case of building a relationship graph over user-name entries in a table, with X degrees of separation, where X will be supplied by the end-user as an argument. My guess is recursive-style UDF would work well, but is it possible?
****EDIT: More detail on the use-case:**
Consider a table with transaction data, which contains the counterparts in each row, along with some other information:
Buyer, Seller
Bob->Alice
Bob->Carol
Bob->John
John-Peter
John-Sam
Bob->Mary
Suppose I want to visualize the relationship of Bob with his counterparts, with 1 degree of separation (i.e. also showing each counterpart's relationships 1 step removed from Bob). I want to use the force graph like this one here: D3 Force-Collapsible Graph
This graph requires a .JSON file with the following structure:
{
"name": "Bob", "size":5000,
"children":
[
{"name":"Alice","size":3000},
{"name":"Carol","size":3000},
{"name":"John","size":3000,
"children":[
{"name":"Peter","size":3000},
{"name":"Sam","size":3000}
]},
{"name":"Mary","size":3000}
]
}
so, with 1 degree of separation, Bob has 4 children, and out of those, John has 2 children. This can go deeper with X degrees of separation, ideally with X provided by the user, but practically can also be hard-coded to say level 3 or 5.
Try below
It is generic enough and has quite simple pattern to follow if you will need to extend it to more degrees of separation
For the sake of example I introduced logic for size attribute - which is (in below example) literally size of item in terms of number of items in in it (including itself) - so it is essentially count of children + 1
So, enjoy:
#standardSQL
CREATE TEMP FUNCTION size(item STRING) AS (
(SELECT CAST(IFNULL(1 + (LENGTH(item) - LENGTH(REPLACE(item, 'name', '')))/4, 1) AS STRING))
);
CREATE TEMP FUNCTION dress(parent STRING, children STRING) AS (
(SELECT CONCAT('{"name":"', parent, '","size":', size(children), IFNULL(CONCAT(',"children":[', children, ']'), ''), '}'))
);
WITH items AS (
SELECT 'Bob' AS parent, 'Alice' AS child UNION ALL
SELECT 'Bob' AS parent, 'Carol' AS child UNION ALL
SELECT 'Bob' AS parent, 'John' AS child UNION ALL
SELECT 'John' AS parent, 'Peter' AS child UNION ALL
SELECT 'John' AS parent, 'Sam' AS child UNION ALL
SELECT 'Peter' AS parent, 'Sam' AS child UNION ALL
SELECT 'Sam' AS parent, 'Mike' AS child UNION ALL
SELECT 'Sam' AS parent, 'Nick' AS child UNION ALL
SELECT 'Bob' AS parent, 'Mary' AS child
), degree2 AS (
SELECT d1.parent AS parent, d1.child AS child_1, d2.child AS child_2
FROM items AS d1 LEFT JOIN items AS d2 ON d1.child = d2.parent
), degree3 AS (
SELECT d1.*, d2.child AS child_3
FROM degree2 AS d1 LEFT JOIN items AS d2 ON d1.child_2 = d2.parent
), degree4 AS (
SELECT d1.*, d2.child AS child_4
FROM degree3 AS d1 LEFT JOIN items AS d2 ON d1.child_3 = d2.parent
)
SELECT STRING_AGG(dress(parent, child_1), ',') AS parent FROM (
SELECT parent, STRING_AGG(dress(child_1, child_2), ',') AS child_1 FROM (
SELECT parent, child_1, STRING_AGG(dress(child_2, child_3), ',') AS child_2 FROM (
SELECT parent, child_1, child_2, STRING_AGG(dress(child_3, child_4), ',') AS child_3 FROM (
SELECT parent, child_1, child_2, child_3, STRING_AGG(dress(child_4, NULL), ',') AS child_4 FROM degree4
GROUP BY 1,2,3,4 ORDER BY 1,2,3,4 )
GROUP BY 1,2,3 ORDER BY 1,2,3 )
GROUP BY 1,2 ORDER BY 1,2 ) GROUP BY 1 ORDER BY 1 )
It returns exactly what you need - see "beautified" version of it below
{"name": "Bob","size": 12,"children": [
{"name": "Alice","size": 1},
{"name": "Carol","size": 1},
{"name": "John","size": 8,"children": [
{"name": "Peter","size": 4,"children": [
{"name": "Sam","size": 3,"children": [
{"name": "Mike","size": 1},
{"name": "Nick","size": 1} ]}
]},
{"name": "Sam","size": 3,"children": [
{"name": "Mike","size": 1},
{"name": "Nick","size": 1} ]}
]},
{"name": "Mary","size": 1}
]},
{"name": "John","size": 8,"children": [
{"name": "Peter","size": 4,"children": [
{"name": "Sam","size": 3,"children": [
{"name": "Mike","size": 1},
{"name": "Nick","size": 1} ]}
]},
{"name": "Sam","size": 3,"children": [
{"name": "Mike","size": 1},
{"name": "Nick","size": 1} ]}
]},
{"name": "Peter","size": 4,"children": [
{"name": "Sam","size": 3,"children": [
{"name": "Mike","size": 1},
{"name": "Nick","size": 1} ]}
]},
{"name": "Sam","size": 3,"children": [
{"name": "Mike","size": 1},
{"name": "Nick","size": 1} ]}
Most likely, above can be further generalized - but I thought it is already good enough for you to try :o)
You can have a JavaScript UDF make recursive calls, but it can't execute another SQL statement. If you know the number of recursions/iterations in advance, it may be possible to define a SQL function instead, such as:
#standardSQL
CREATE TEMP FUNCTION SumToN(x INT64) AS (
(SELECT SUM(v) FROM UNNEST(GENERATE_ARRAY(1, x)) AS v)
);
Using GENERATE_ARRAY, you can create a for loop of the desired length. Here's another example that doesn't involve a UDF, but uses GENERATE_ARRAY to concatenate a variable number of strings:
#standardSQL
WITH T AS (
SELECT 2 AS x, 'foo' AS y UNION ALL
SELECT 4 AS x, 'bar' AS y)
SELECT
y,
(SELECT STRING_AGG(CONCAT(y, CAST(v AS STRING)))
FROM UNNEST(GENERATE_ARRAY(1, x)) AS v) AS rep_y
FROM T;
+-----+---------------------+
| y | rep_y |
+-----+---------------------+
| foo | foo1,foo2 |
| bar | bar1,bar2,bar3,bar4 |
+-----+---------------------+
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With