Write a structured query (using spark-shell
or Databricks Community Edition) that gives the 1st and 2nd bestsellers per genre.
Protip™: Use rank standard function
Module: Spark SQL
Duration: 15 mins
id,title,genre,quantity
1,Hunter Fields,romance,15
2,Leonard Lewis,thriller,81
3,Jason Dawson,thriller,90
4,Andre Grant,thriller,25
5,Earl Walton,romance,40
6,Alan Hanson,romance,24
7,Clyde Matthews,thriller,31
8,Josephine Leonard,thriller,1
9,Owen Boone,sci-fi,27
10,Max McBride,romance,75
val books = spark
.read
.option("header", true)
.option("inferSchema", true)
.csv("books.csv")
scala> books.show
+---+-----------------+--------+--------+
| id| title| genre|quantity|
+---+-----------------+--------+--------+
| 1| Hunter Fields| romance| 15|
| 2| Leonard Lewis|thriller| 81|
| 3| Jason Dawson|thriller| 90|
| 4| Andre Grant|thriller| 25|
| 5| Earl Walton| romance| 40|
| 6| Alan Hanson| romance| 24|
| 7| Clyde Matthews|thriller| 31|
| 8|Josephine Leonard|thriller| 1|
| 9| Owen Boone| sci-fi| 27|
| 10| Max McBride| romance| 75|
+---+-----------------+--------+--------+
NOTE: Use Online Generate Test Data for more sophisticated datasets in CSV or JSON format.
+---+-------------+--------+--------+----+
| id| title| genre|quantity|rank|
+---+-------------+--------+--------+----+
| 10| Max McBride| romance| 75| 1|
| 5| Earl Walton| romance| 40| 2|
| 3| Jason Dawson|thriller| 90| 1|
| 2|Leonard Lewis|thriller| 81| 2|
| 9| Owen Boone| sci-fi| 27| 1|
+---+-------------+--------+--------+----+