Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get the name of a Spark Column as String?

I want to write a method to round a numeric column without doing something like:

df
.select(round($"x",2).as("x"))

Therefore I need to have a reusable column-expression like:

def roundKeepName(c:Column,scale:Int) = round(c,scale).as(c.name)

Unfortunately c.name does not exist, therefore the above code does not compile. I've found a solution for ColumName:

 def roundKeepName(c:ColumnName,scale:Int) = round(c,scale).as(c.string.name)

But how can I do that with Column (which is generated if I use col("x") instead of $"x")

like image 435
Raphael Roth Avatar asked Oct 25 '25 01:10

Raphael Roth


1 Answers

Not sure if the question has really been answered. Your function could be implemented like this (toString returns the name of the column):

def roundKeepname(c:Column,scale:Int) = round(c,scale).as(c.toString)

In case you don't like relying on toString, here is a more robust version. You can rely on the underlying expression, cast it to a NamedExpression and take its name.

import org.apache.spark.sql.catalyst.expressions.NamedExpression
def roundKeepname(c:Column,scale:Int) = 
    c.expr.asInstanceOf[NamedExpression].name

And it works:

scala> spark.range(2).select(roundKeepname('id, 2)).show
+---+
| id|
+---+
|  0|
|  1|
+---+  

EDIT Finally, if that's OK for you to use the name of the column instead of the Column object, you can change the signature of the function and that yields a much simpler implementation:

def roundKeepName(columnName:String, scale:Int) = 
    round(col(columnName),scale).as(columnName)
like image 53
Oli Avatar answered Oct 27 '25 15:10

Oli