Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to filter by date range in Spark SQL

I'm trying to filter the date range from the following data using Data bricks, which returns null as response. My csv data looks like:

ID, Desc, Week_Ending_Date
100, AAA, 13-06-2015
101, BBB, 11-07-2015
102, CCC, 15-08-2015
103, DDD, 05-09-2015
100, AAA, 29-08-2015
100, AAA, 22-08-2015

My query is:

df.select(df("ID"), date_format(df("Week_Ending_Date"), "yyyy-MM-dd"))
.filter(date_format(df("Week_Ending_Date"), "yyyy-MM-  dd").between("2015-07-05", "2015-09-02"))

Any help is much appreciated.

like image 687
prit4fun Avatar asked Sep 13 '25 15:09

prit4fun


1 Answers

From the top of my head, I would have done the following by converting the date column while reading it and then apply the filter using an alias :

import java.text.SimpleDateFormat

val format = new SimpleDateFormat("dd-MM-yyyy")
val data = sc.parallelize(
  List((100, "AAA", "13-06-2015"), (101, "BBB", "11-07-2015"), (102, "CCC", "15-08-2015"), (103, "DDD", "05-09-2015"), (100, "AAA", "29-08-2015"), (100, "AAA", "22-08-2015")).toSeq).map {
  r =>
    val date: java.sql.Date = new java.sql.Date(format.parse(r._3).getTime);
    (r._1, r._2, date)
}.toDF("ID", "Desc", "Week_Ending_Date")

data.show

//+---+----+----------------+
//| ID|Desc|Week_Ending_Date|
//+---+----+----------------+
//|100| AAA|      2015-06-13|
//|101| BBB|      2015-07-11|
//|102| CCC|      2015-08-15|
//|103| DDD|      2015-09-05|
//|100| AAA|      2015-08-29|
//|100| AAA|      2015-08-22|
//+---+----+----------------+

val filteredData = data
           .select(data("ID"), date_format(data("Week_Ending_Date"), "yyyy-MM-dd").alias("date"))
           .filter($"date".between("2015-07-05", "2015-09-02"))

//+---+----------+
//| ID|      date|
//+---+----------+
//|101|2015-07-11|
//|102|2015-08-15|
//|100|2015-08-29|
//|100|2015-08-22|
//+---+----------+
like image 149
eliasah Avatar answered Sep 16 '25 08:09

eliasah