How to get the name of a Spark Column as String?

Question

I want to write a method to round a numeric column without doing something like:

df
.select(round($"x",2).as("x"))

Therefore I need to have a reusable column-expression like:

def roundKeepName(c:Column,scale:Int) = round(c,scale).as(c.name)

Unfortunately c.name does not exist, therefore the above code does not compile. I've found a solution for ColumName:

 def roundKeepName(c:ColumnName,scale:Int) = round(c,scale).as(c.string.name)

But how can I do that with Column (which is generated if I use col("x") instead of $"x")

Oli · Accepted Answer

Not sure if the question has really been answered. Your function could be implemented like this (toString returns the name of the column):

def roundKeepname(c:Column,scale:Int) = round(c,scale).as(c.toString)

In case you don't like relying on toString, here is a more robust version. You can rely on the underlying expression, cast it to a NamedExpression and take its name.

import org.apache.spark.sql.catalyst.expressions.NamedExpression
def roundKeepname(c:Column,scale:Int) = 
    c.expr.asInstanceOf[NamedExpression].name

And it works:

scala> spark.range(2).select(roundKeepname('id, 2)).show
+---+
| id|
+---+
|  0|
|  1|
+---+

EDIT Finally, if that's OK for you to use the name of the column instead of the Column object, you can change the signature of the function and that yields a much simpler implementation:

def roundKeepName(columnName:String, scale:Int) = 
    round(col(columnName),scale).as(columnName)

How to get the name of a Spark Column as String?

Tags:

scala

apache-spark

Raphael Roth

1 Answers

Oli

Recent Activity

Donate For Us

How to get the name of a Spark Column as String?

Tags:

scala

apache-spark

Raphael Roth

1 Answers

Oli

Related questions

Recent Activity

Donate For Us