spark-workshop

Exercise: Finding Longest Sequence (Window Aggregation)

Write a structured query that finds the longest sequence of consecutive numbers.

Protip™: Use rank standard function followed by Dataset.groupBy operator to count the same ranks

Module: Spark SQL

Duration: 30 mins

Input Dataset

ID,time
1,1
1,2
1,4
1,7
1,8
1,9
2,1
3,1
3,2
3,3
val visits = spark
  .read
  .option("header", true)
  .option("inferSchema", true)
  .csv("visits.csv")
scala> visits.show
+---+----+
| ID|time|
+---+----+
|  1|   1|
|  1|   2|
|  1|   4|
|  1|   7|
|  1|   8|
|  1|   9|
|  2|   1|
|  3|   1|
|  3|   2|
|  3|   3|
+---+----+

Result

+---+----+
| ID|time|
+---+----+
|  1|   3|
|  2|   1|
|  3|   3|
+---+----+

Credits