Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Scala - Filter DataFrame using "endsWith"

Given a DataFrame :

 val df = sc.parallelize(List(("Mike","1986","1976"), ("Andre","1980","1966"), ("Pedro","1989","2000")))
      .toDF("info", "year1", "year2")
df.show

 +-----+-----+-----+
 | info|year1|year2|
 +-----+-----+-----+
 | Mike| 1986| 1976|
 |Andre| 1980| 1966|
 |Pedro| 1989| 2000|
 +-----+-----+-----+

I try to filter all df values ends with 6 , but getting exceptions . I tried :

  val filtered = df.filter(df.col("*").endsWith("6"))
  org.apache.spark.sql.catalyst.analysis.UnresolvedException: Invalid call to dataType on unresolved object, tree: ResolvedStar(info#20, year1#21, year2#22)

and I tried this one too :

val filtered = df.select(df.col("*")).filter(_ endsWith("6"))
error: missing parameter type for expanded function ((x$1) => x$1.endsWith("6"))

How to fix that ? thanks

like image 756
Toren Avatar asked Oct 31 '25 13:10

Toren


1 Answers

I'm not very sure about what you are trying to do but from what I understand :

val df = sc.parallelize(List(("Mike","1986","1976"), ("Andre","1980","1966"), ("Pedro","1989","2000"))).toDF("info", "year1", "year2")
df.show 
# +-----+-----+-----+
# | info|year1|year2|
# +-----+-----+-----+
# | Mike| 1986| 1976|
# |Andre| 1980| 1966|
# |Pedro| 1989| 2000|
# +-----+-----+-----+

val conditions = df.columns.map(df(_).endsWith("6")).reduce(_ or _)
df.withColumn("condition", conditions).filter($"condition" === true).drop("condition").show
# +-----+-----+-----+
# | info|year1|year2|
# +-----+-----+-----+
# |Andre| 1980| 1966|
# | Mike| 1986| 1976|
# +-----+-----+-----+
like image 66
eliasah Avatar answered Nov 02 '25 21:11

eliasah