Write a structured query (using spark-shell
or Databricks Community Edition) that collects ids per group in a dataset.
Protip™: Use collect_list standard function
Extra: The values collected should be ordered in a descending order
Module: Spark SQL
Duration: 15 mins
val nums = spark.range(5).withColumn("group", 'id % 2)
scala> nums.show
+---+-----+
| id|group|
+---+-----+
| 0| 0|
| 1| 1|
| 2| 0|
| 3| 1|
| 4| 0|
+---+-----+
+-----+---------+
|group| ids|
+-----+---------+
| 0|[0, 2, 4]|
| 1| [1, 3]|
+-----+---------+