I have to fill the first null values with immediate value of the same column in dataframe. This logic applies only on first consecutive null values only of the column.
I have a dataframe with similar to below
//I replaced null to 0 in value column
val df = Seq( (0,"exA",30), (0,"exB",22), (0,"exC",19), (16,"exD",13),
(5,"exE",28), (6,"exF",26), (0,"exG",12), (13,"exH",53))
.toDF("value", "col2", "col3")
scala> df.show(false)
+-----+----+----+
|value|col2|col3|
+-----+----+----+
|0 |exA |30 |
|0 |exB |22 |
|0 |exC |19 |
|16 |exD |13 |
|5 |exE |28 |
|6 |exF |26 |
|0 |exG |12 |
|13 |exH |53 |
+-----+----+----+
From this dataframe I am expecting as below
scala> df.show(false)
+-----+----+----+
|value|col2|col3|
+-----+----+----+
|16 |exA |30 | // Change the value 0 to 16 at value column
|16 |exB |22 | // Change the value 0 to 16 at value column
|16 |exC |19 | // Change the value 0 to 16 at value column
|16 |exD |13 |
|5 |exE |28 |
|6 |exF |26 |
|0 |exG |12 | // value should not be change here
|13 |exH |53 |
+-----+----+----+
Please help me solve this.
You can use Window function for this purpose
val df = Seq( (0,"exA",30), (0,"exB",22), (0,"exC",19), (16,"exD",13),
(5,"exE",28), (6,"exF",26), (0,"exG",12), (13,"exH",53))
.toDF("value", "col2", "col3")
val w = Window.orderBy($"col2".desc)
df.withColumn("Result", last(when($"value" === 0, null).otherwise($"value"), ignoreNulls = true).over(w))
.orderBy($"col2")
.show(10)
Will result in
+-----+----+----+------+
|value|col2|col3|Result|
+-----+----+----+------+
| 0| exA| 30| 16|
| 0| exB| 22| 16|
| 0| exC| 19| 16|
| 16| exD| 13| 16|
| 5| exE| 28| 5|
| 6| exF| 26| 6|
| 0| exG| 12| 13|
| 13| exH| 53| 13|
+-----+----+----+------+
Expression df.orderBy($"col2") is needed only to show final results in right order. You can skip it if you don't care about final order.
UPDATE To get exactly what you need you should you a little bit more complicated code
val w = Window.orderBy($"col2")
val w2 = Window.orderBy($"col2".desc)
df.withColumn("IntermediateResult", first(when($"value" === 0, null).otherwise($"value"), ignoreNulls = true).over(w))
.withColumn("Result", when($"IntermediateResult".isNull, last($"IntermediateResult", ignoreNulls = true).over(w2)).otherwise($"value"))
.orderBy($"col2")
.show(10)
+-----+----+----+------------------+------+
|value|col2|col3|IntermediateResult|Result|
+-----+----+----+------------------+------+
| 0| exA| 30| null| 16|
| 0| exB| 22| null| 16|
| 0| exC| 19| null| 16|
| 16| exD| 13| 16| 16|
| 5| exE| 28| 16| 5|
| 6| exF| 26| 16| 6|
| 0| exG| 12| 16| 0|
| 13| exH| 53| 16| 13|
+-----+----+----+------------------+------+
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With