Develop a standalone Spark SQL application (using IntelliJ IDEA) that creates a new row for every element in the given array column.
Use web UI to compare performance (query plans) of queries with explode
standard function and Dataset.flatMap
operator.
Think about the differences between explode
function and flatMap
operator. Are there any? What are they? Can you generate new rows? How many?
Module: Spark SQL
Duration: 30 mins
val nums = Seq(Seq(1,2,3)).toDF("nums")
scala> nums.printSchema
root
|-- nums: array (nullable = true)
| |-- element: integer (containsNull = false)
scala> nums.show
+---------+
| nums|
+---------+
|[1, 2, 3]|
+---------+
+---------+---+
| nums|num|
+---------+---+
|[1, 2, 3]| 1|
|[1, 2, 3]| 2|
|[1, 2, 3]| 3|
+---------+---+