spark-workshop

Exercise: Finding maximum values per group (groupBy)

Develop a standalone Spark SQL application (using IntelliJ IDEA) that finds the highest (maximum) numbers per group.

Protip™: Use Dataset.groupBy operator and max standard function

Module: Spark SQL

Duration: 20 mins

Input Dataset

val nums = spark.range(5).withColumn("group", 'id % 2)
scala> nums.show
+---+-----+
| id|group|
+---+-----+
|  0|    0|
|  1|    1|
|  2|    0|
|  3|    1|
|  4|    0|
+---+-----+

Result

+-----+------+
|group|max_id|
+-----+------+
|    0|     4|
|    1|     3|
+-----+------+