Consider a basic groupBy
expression on a DataFrame
:
val groupDf = rsdf.groupBy("league","vendor").agg(mean('league),mean('vendor))
The groupBy
portion is fine: it is using strings for column names. However the agg
(/mean
) is not - since apparently Symbol
were not supported here.
I am wondering why Symbol
's do not work here - and when they are permitted in Spark SQL.
The short answer is never. There are no DataFrame
methods which support Symbols
directly.
The long answers is everywhere, where Spark compiler expects Column
, but you'll need additional objects in scope.
The only reason why Symbols work at all is implicit conversion from Symbol
to Column
provided SQLImplicits.implicits
.
Once imported, compiler will be able to cast Symbol
, whenever Column
is required, including agg
(and implicits are in the scope):
import spark.implicits._
import org.apache.spark.sql.functions._
val df = Seq((1, 2)).toDF("league", "vendor")
df.groupBy("league","vendor").agg(mean('league),mean('vendor)).show
+------+------+-----------+-----------+
|league|vendor|avg(league)|avg(vendor)|
+------+------+-----------+-----------+
| 1| 2| 1.0| 2.0|
+------+------+-----------+-----------+
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With