spark-workshop

Exercise: Finding First Non-Null Value per Group

Write a structured query that finds the first non-null value per group.

Protip™: Review the input arguments of the first standard function

Module: Spark SQL

Duration: 15 mins

Input Dataset

val data = Seq(
  (None, 0),
  (None, 1),
  (Some(2), 0),
  (None, 1),
  (Some(4), 1)).toDF("id", "group")
scala> data.show
+----+-----+
|  id|group|
+----+-----+
|null|    0|
|null|    1|
|   2|    0|
|null|    1|
|   4|    1|
+----+-----+

Result

+-----+--------------+
|group|first_non_null|
+-----+--------------+
|    1|             4|
|    0|             2|
+-----+--------------+

Credits