Write a structured query that shows the difference in salaries between the top-paid employee and others per department. In other words, we want to know how much more the highest-paid employee gets compared to other teammates.
The exercise could also be described as “Calculating the gap between the current book and the bestseller per genre” (given the other exercise with book sales and bestsellers).
Protip™: Use max or first standard functions
Module: Spark SQL
Duration: 15 mins
id,name,department,salary
1,Hunter Fields,IT,15
2,Leonard Lewis,Support,81
3,Jason Dawson,Support,90
4,Andre Grant,Support,25
5,Earl Walton,IT,40
6,Alan Hanson,IT,24
7,Clyde Matthews,Support,31
8,Josephine Leonard,Support,1
9,Owen Boone,HR,27
10,Max McBride,IT,75
val salaries = spark
.read
.option("header", true)
.option("inferSchema", true)
.csv("salaries.csv")
scala> salaries.show
+---+-----------------+----------+------+
| id| name|department|salary|
+---+-----------------+----------+------+
| 1| Hunter Fields| IT| 15|
| 2| Leonard Lewis| Support| 81|
| 3| Jason Dawson| Support| 90|
| 4| Andre Grant| Support| 25|
| 5| Earl Walton| IT| 40|
| 6| Alan Hanson| IT| 24|
| 7| Clyde Matthews| Support| 31|
| 8|Josephine Leonard| Support| 1|
| 9| Owen Boone| HR| 27|
| 10| Max McBride| IT| 75|
+---+-----------------+----------+------+
NOTE: Use Online Generate Test Data for more sophisticated datasets in CSV or JSON format.
+---+-----------------+----------+------+----+
| id| name|department|salary|diff|
+---+-----------------+----------+------+----+
| 9| Owen Boone| HR| 27| 0|
| 1| Hunter Fields| IT| 15| 60|
| 5| Earl Walton| IT| 40| 35|
| 6| Alan Hanson| IT| 24| 51|
| 10| Max McBride| IT| 75| 0|
| 2| Leonard Lewis| Support| 81| 9|
| 3| Jason Dawson| Support| 90| 0|
| 4| Andre Grant| Support| 25| 65|
| 7| Clyde Matthews| Support| 31| 59|
| 8|Josephine Leonard| Support| 1| 89|
+---+-----------------+----------+------+----+
orderBy
influence the result? Why?
rangeBetween
)