Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Spark Get the udf name from column and execute it

I registered some udfs, all have the same input parameter types and the same output type(String). Let's say udf1, udf2, udf3. All have different functions.

In my dataset I have multiple columns, in one column I have the name of the udf I want to execute on this row of data.

Dataset example:

+---+-------+-------+
|A  |   B   |udf    |
+---+-------+-------+
|1  |   a   |udf1   |
|2  |   b   |udf2   |
|3  |   c   |udf3   |
+---+-------+-------+

I want to do something like this:

ds.withColumn("TEST", functions.callUDF(<name of right udf>, col("A"), col("B"))

How can I achieve this? Is it possible and if not, what is a possible workaround?

Background: My Spark Job has a set of UDFs and I want to dynamically execute the right udf for the row.

like image 846
slooock Avatar asked Jan 22 '26 18:01

slooock


1 Answers

Try This ::

def func1(y: Int, z: String): String = y+z
def func2(y: Int, z: String): String = y+","+z
def default(y: Int, z: String): String = y

val udfName = udf({ (x: String, y: Int, z: String) => x match {
case "func1" => func1(y,z)
case "func2" => func2(y,z)
case _ => default(y,z)
}})

val data = Seq((1,"a","func1"),
(2,"b","func2")
).toDF("A", "B", "udf")

data.withColumn("TEST", udfName(col("udf"), col("A"), col("B")))

You can also use source code library for more advanced way of handling this :

scala get function name that was sent as param

like image 167
Saswat Avatar answered Jan 24 '26 10:01

Saswat



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!