Write a structured query that pivots a dataset on multiple columns. Since pivot
aggregation allows for a single column only, find a solution to pivot on two or more columns.
Protip™: Use RelationalGroupedDataset.pivot
and Dataset.join
operators
Module: Spark SQL
Duration: 30 mins
+---+---+-----+-----+
| id|day|price|units|
+---+---+-----+-----+
|100| 1| 23| 10|
|100| 2| 45| 11|
|100| 3| 67| 12|
|100| 4| 78| 13|
|101| 1| 23| 10|
|101| 2| 45| 13|
|101| 3| 67| 14|
|101| 4| 78| 15|
|102| 1| 23| 10|
|102| 2| 45| 11|
|102| 3| 67| 16|
|102| 4| 78| 18|
+---+---+-----+-----+
val data = Seq(
(100,1,23,10),
(100,2,45,11),
(100,3,67,12),
(100,4,78,13),
(101,1,23,10),
(101,2,45,13),
(101,3,67,14),
(101,4,78,15),
(102,1,23,10),
(102,2,45,11),
(102,3,67,16),
(102,4,78,18)).toDF("id", "day", "price", "units")
+---+-------+-------+-------+-------+------+------+------+------+
| id|price_1|price_2|price_3|price_4|unit_1|unit_2|unit_3|unit_4|
+---+-------+-------+-------+-------+------+------+------+------+
|100| 23| 45| 67| 78| 10| 11| 12| 13|
|101| 23| 45| 67| 78| 10| 13| 14| 15|
|102| 23| 45| 67| 78| 10| 11| 16| 18|
+---+-------+-------+-------+-------+------+------+------+------+
Think how you can avoid multiple scans over the input dataset.