Write a structured query (using spark-shell
or Databricks Community Edition) that gives the most populated cities per country with the population.
Protip™: Use Dataset.groupBy operator and max standard function followed by Dataset.join.
NOTE: population
column in the input dataset is a string and contains spaces.
Module: Spark SQL
Duration: 30 mins
+-----------------+-------------+----------+
| name| country|population|
+-----------------+-------------+----------+
| Warsaw| Poland| 1 764 615|
| Cracow| Poland| 769 498|
| Paris| France| 2 206 488|
|Villeneuve-Loubet| France| 15 020|
| Pittsburgh PA|United States| 302 407|
| Chicago IL|United States| 2 716 000|
| Milwaukee WI|United States| 595 351|
| Vilnius| Lithuania| 580 020|
| Stockholm| Sweden| 972 647|
| Goteborg| Sweden| 580 020|
+-----------------+-------------+----------+
name,country,population
Warsaw,Poland,1 764 615
Cracow,Poland,769 498
Paris,France,2 206 488
Villeneuve-Loubet,France,15 020
Pittsburgh PA,United States,302 407
Chicago IL,United States,2 716 000
Milwaukee WI,United States,595 351
Vilnius,Lithuania,580 020
Stockholm,Sweden,972 647
Goteborg,Sweden,580 020
+----------+-------------+----------+
| name| country|population|
+----------+-------------+----------+
| Warsaw| Poland| 1 764 615|
| Paris| France| 2 206 488|
|Chicago IL|United States| 2 716 000|
| Vilnius| Lithuania| 580 020|
| Stockholm| Sweden| 972 647|
+----------+-------------+----------+