I need to use date_add() function to add 90 days to a dataframe's column. The function works correctly, but only when I hardcode the 90. If the number is in another column and I refer it, the function asks me for an integer to work.
This code works:
.withColumn("DATE_SUM_COLUMN",date_add(col("DATE_COLUMN"),90))
This code does not:
.withColumn("DATE_SUM_COLUMN",date_add(col("DATE_COLUMN"),col("number")))
Thanks.
You still may use expr("date_add(date_column, days_to_add)") function to evaluate a Spark SQL string:
import java.sql.Date
import com.holdenkarau.spark.testing.{DataFrameSuiteBase, SharedSparkContext}
import org.scalatest.FlatSpec
import org.apache.spark.sql.functions.expr
class TestSo2 extends FlatSpec with SharedSparkContext with DataFrameSuiteBase {
"date_add" should "add number of dates specified as Column" in {
import spark.implicits._
val df = Seq(
(Date.valueOf("2019-01-01"), 31),
(Date.valueOf("2019-01-01"), 32)
).toDF("date_column", "days_to_add")
df.show()
/**
* +-----------+-----------+
* |date_column|days_to_add|
* +-----------+-----------+
* | 2019-01-01| 31|
* | 2019-01-01| 32|
* +-----------+-----------+
*/
df.
withColumn(
"next_date",
expr("date_add(date_column, days_to_add)")
).
show
/**
* +-----------+-----------+----------+
* |date_column|days_to_add| next_date|
* +-----------+-----------+----------+
* | 2019-01-01| 31|2019-02-01|
* | 2019-01-01| 32|2019-02-02|
* +-----------+-----------+----------+
*/
}
}
I don't know the reasons why spark developers have not made it as a part of Scala API though.
Please try this here I am converting date to seconds, converting days column to seconds and summing the two columnsns. Again we have to convert the final result to date format. Here date is my date column, add is days to add for the date column
import org.apache.spark.sql.functions._
.withColumn("new col", unix_timestamp($"date", "yyyy-MM-dd") + col("add")*24*60*60)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With