Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Spark: Split is not a member of org.apache.spark.sql.Row

Below is my code from Spark 1.6. I am trying to convert it to Spark 2.3 but I am getting error for using split.

Spark 1.6 code:

val file = spark.textFile(args(0))
val mapping = file.map(_.split('/t')).map(a => a(1))
mapping.saveAsTextFile(args(1))

Spark 2.3 code:

val file = spark.read.text(args(0))
val mapping = file.map(_.split('/t')).map(a => a(1)) //Getting Error Here
mapping.write.text(args(1))

Error Message:

value split is not a member of org.apache.spark.sql.Row
like image 589
Vin Avatar asked Oct 22 '25 03:10

Vin


1 Answers

Unlike spark.textFile which returns a RDD, spark.read.text returns a DataFrame which is essentially a RDD[Row]. You could perform map with a partial function as shown in the following example:

// /path/to/textfile:
// a    b   c
// d    e   f

import org.apache.spark.sql.Row

val df = spark.read.text("/path/to/textfile")

df.map{ case Row(s: String) => s.split("\\t") }.map(_(1)).show
// +-----+
// |value|
// +-----+
// |    b|
// |    e|
// +-----+
like image 104
Leo C Avatar answered Oct 23 '25 18:10

Leo C



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!