spark-workshop

Exercise: Using Dataset.flatMap Operator

Write a structured query (using spark-shell or Databricks Community Edition) that creates as many rows as the number of elements in a given array column. The values of the new rows should be the elements of the array column themselves.

Protip™: Use Dataset.flatMap operator

Module: Spark SQL

Duration: 30 mins

Input Dataset

val nums = Seq(Seq(1,2,3)).toDF("nums")

scala> nums.printSchema
root
 |-- nums: array (nullable = true)
 |    |-- element: integer (containsNull = false)


scala> nums.show
+---------+
|     nums|
+---------+
|[1, 2, 3]|
+---------+

Result

+---------+---+
|     nums|num|
+---------+---+
|[1, 2, 3]|  1|
|[1, 2, 3]|  2|
|[1, 2, 3]|  3|
+---------+---+

Please note that the output has two columns (not one!)

  1. Scaladoc of the Dataset API