Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Spark - Remove intersecting elements between two array type columns

I have dataframe like this

+---------+--------------------+----------------------------+
|     Name|                rem1|        quota               |
+---------+--------------------+----------------------------+
|Customer_3|[258, 259, 260, 2...|[1, 2, 3, 4, 5, 6, 7,..500]|
|Customer_4|[18, 19, 20, 27, ...|[1, 2, 3, 4, 5, 6, 7,..500]|
|Customer_5|[16, 17, 51, 52, ...|[1, 2, 3, 4, 5, 6, 7,..500]|
|Customer_6|[6, 7, 8, 9, 10, ...|[1, 2, 3, 4, 5, 6, 7,..500]|
|Customer_7|[0, 30, 31, 32, 3...|[1, 2, 3, 4, 5, 6, 7,..500]|

I would like to remove list value in rem1 from quota and create as one new column. I have tried.

val dfleft = dfpci_remove2.withColumn("left",$"quota".filter($"rem1"))

<console>:123: error: value filter is not a member of org.apache.spark.sql.ColumnName

Please advise.

like image 294
getitout Avatar asked Sep 05 '25 03:09

getitout


1 Answers

You can use a filter in a column in such way, you can write an udf as below

  val filterList = udf((a: Seq[Int], b: Seq[Int]) => a diff b)

  df.withColumn("left", filterList($"rem1", $"quota") )

This should give you the expected result.

Hope this helps!

like image 137
koiralo Avatar answered Sep 07 '25 20:09

koiralo