spark-workshop

Structs for column names and values

Write a structured query that “transpose” a dataset so a new dataset uses column names and values from a struct column.

Module: Spark SQL

Duration: 30 mins

Input Dataset

case class MovieRatings(movieName: String, rating: Double)
case class MovieCritics(name: String, movieRatings: Seq[MovieRatings])
val movies_critics = Seq(
  MovieCritics("Manuel", Seq(MovieRatings("Logan", 1.5), MovieRatings("Zoolander", 3), MovieRatings("John Wick", 2.5))),
  MovieCritics("John", Seq(MovieRatings("Logan", 2), MovieRatings("Zoolander", 3.5), MovieRatings("John Wick", 3))))
val ratings = movies_critics.toDF
scala> ratings.show(truncate = false)
+------+--------------------------------------------------+
|name  |movieRatings                                      |
+------+--------------------------------------------------+
|Manuel|[[Logan, 1.5], [Zoolander, 3.0], [John Wick, 2.5]]|
|John  |[[Logan, 2.0], [Zoolander, 3.5], [John Wick, 3.0]]|
+------+--------------------------------------------------+

Result

scala> solution.show(truncate = false)
+------+-----+---------+---------+
|name  |Logan|Zoolander|John Wick|
+------+-----+---------+---------+
|Manuel|1.5  |3.0      |2.5      |
|John  |2.0  |3.5      |3.0      |
+------+-----+---------+---------+