When can symbols be used to represent columns in spark sql?

Question

Consider a basic groupBy expression on a DataFrame:

val groupDf  = rsdf.groupBy("league","vendor").agg(mean('league),mean('vendor))

The groupBy portion is fine: it is using strings for column names. However the agg (/mean) is not - since apparently Symbol were not supported here.

I am wondering why Symbol's do not work here - and when they are permitted in Spark SQL.

Alper t. Turker · Accepted Answer

The short answer is never. There are no DataFrame methods which support Symbols directly.

The long answers is everywhere, where Spark compiler expects Column, but you'll need additional objects in scope.

The only reason why Symbols work at all is implicit conversion from Symbol to Column provided SQLImplicits.implicits.

Once imported, compiler will be able to cast Symbol, whenever Column is required, including agg (and implicits are in the scope):

import spark.implicits._
import org.apache.spark.sql.functions._

val df = Seq((1, 2)).toDF("league", "vendor")

df.groupBy("league","vendor").agg(mean('league),mean('vendor)).show

+------+------+-----------+-----------+                                         
|league|vendor|avg(league)|avg(vendor)|
+------+------+-----------+-----------+
|     1|     2|        1.0|        2.0|
+------+------+-----------+-----------+

When can symbols be used to represent columns in spark sql?

Tags:

apache-spark

apache-spark-sql

WestCoastProjects

1 Answers

Alper t. Turker

Recent Activity

Donate For Us

When can symbols be used to represent columns in spark sql?

Tags:

apache-spark

apache-spark-sql

WestCoastProjects

1 Answers

Alper t. Turker

Related questions

Recent Activity

Donate For Us