Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Spark ML: Taking square root of feature columns

Hi I am using a custom UDF to take square root of each value in each column.

square_root_UDF = udf(lambda x: math.sqrt(x), DoubleType())

for x in features:
  dataTraining = dataTraining.withColumn(x, square_root_UDF(x))

Is there any faster way to get it done ? Polynomial expansion function is not suitable in this case.

like image 704
sjishan Avatar asked Oct 27 '25 12:10

sjishan


1 Answers

Don't use UDF. Instead use built-in:

from pyspark.sql.functions import sqrt

for x in features:
    dataTraining = dataTraining.withColumn(x, sqrt(x))
like image 146
user7757642 Avatar answered Oct 30 '25 06:10

user7757642