Develop a standalone Spark SQL application (using IntelliJ IDEA) that calculates aggregations defined on a command line (e.g. finds the biggest city among the cities in a dataset).
Protip™: Use Dataset.agg operator and standard functions only (not UDFs!)
The standalone application should take at least two input parameters:
max
, avg
)Protip™: Mind the spaces in population
column and then the type.
Extra: Include the name of the city when one aggregation is used.
Module: Spark SQL
Duration: 20 mins
+---+-----------------+----------+
| id| name|population|
+---+-----------------+----------+
| 0| Warsaw| 1 764 615|
| 1|Villeneuve-Loubet| 15 020|
| 2| Vranje| 83 524|
| 3| Pittsburgh| 1 775 634|
+---+-----------------+----------+
id,name,population
0,Warsaw,1 764 615
1,Villeneuve-Loubet,15 020
2,Vranje,83 524
3,Pittsburgh,1 775 634
+----------+
|population|
+----------+
| 1775634|
+----------+