spark-workshop

Exercise: Difference in Days Between Dates As Strings

Write a structured query (using spark-shell or Databricks Community Edition) that calculates the number of days between dates given as text (in some format) and the current date.

Protip™: Use the standard functions for date and time, e.g. to_date and datediff

Module: Spark SQL

Duration: 30 mins

Input Dataset

val dates = Seq(
   "08/11/2015",
   "09/11/2015",
   "09/12/2015").toDF("date_string")
scala> dates.show
+-----------+
|date_string|
+-----------+
| 08/11/2015|
| 09/11/2015|
| 09/12/2015|
+-----------+

Result

+-----------+----------+----+
|date_string|   to_date|diff|
+-----------+----------+----+
| 08/11/2015|2015-11-08|1256|
| 09/11/2015|2015-11-09|1255|
| 09/12/2015|2015-12-09|1225|
+-----------+----------+----+

NOTE: The diff column is current_date-sensitive and your result will certainly be different (as the current date of yours will be after the page was published).