Dynamically infer Schema of returned object from UDF in pySpark

Question

I want to use a UDF in pySpark which doesn't return an atomic value but a nested structure. I know that I can register the UDF and manually set the schema of the object it will return, e.g.

format = ArrayType(
                   StructType([
                               StructField('id',IntegerType()),
                               StructField('text',StringType())
                              ]
                  )
spark.udf.register('functionName', functionObject, format)

and use python lists inside the UDF to match the format, e.g.

return [[1,'A'],[2,'B']]

but is there any way to avoid explicitly setting the return type when registering the UDF, and instead automatically infer its schema?

If I don't set a return type, it is automatically set to StringType.

user7718275 · Accepted Answer

is there any way to avoid explicitly setting the return type when registering the UDF, and instead automatically infer its schema?

There is not. Schema has to be known before udf is called and it cannot be inferred on runtime.

Dynamically infer Schema of returned object from UDF in pySpark

Tags:

python

apache-spark

apache-spark-sql

pyspark

Johnny16

1 Answers

user7718275

Recent Activity

Donate For Us

Dynamically infer Schema of returned object from UDF in pySpark

Tags:

python

apache-spark

apache-spark-sql

pyspark

Johnny16

1 Answers

user7718275

Related questions

Recent Activity

Donate For Us