spark-workshop

Exercise: Finding Most Common Non-null Prefix per Group (Occurences)

Write a structured query that finds the most common not-null PREFIX (occurences) per UNIQUE_GUEST_ID.

Module: Spark SQL

Duration: 15 mins

Credits

This exercise is brought to you by Julien. Merci.

Input Dataset

+---------------+------+
|UNIQUE_GUEST_ID|PREFIX|
+---------------+------+
|              1|    Mr|
|              1|   Mme|
|              1|    Mr|
|              1|  null|
|              1|  null|
|              1|  null|
|              2|    Mr|
|              3|  null|
+---------------+------+
val input = Seq(
  (1, "Mr"),
  (1, "Mme"),
  (1, "Mr"),
  (1, null),
  (1, null),
  (1, null),
  (2, "Mr"),
  (3, null)).toDF("UNIQUE_GUEST_ID", "PREFIX")

Result

+---------------+------+
|UNIQUE_GUEST_ID|PREFIX|
+---------------+------+
|              1|    Mr|
|              2|    Mr|
|              3|  null|
+---------------+------+