Write a structured query that adds aggregations to the input dataset.
EXTRA Can you think of another query to solve it? What’s more efficient?
Module: Spark SQL
Duration: 15 mins
val input = Seq(
("05:49:56.604899", "10.0.0.2.54880", "10.0.0.3.5001", 2),
("05:49:56.604900", "10.0.0.2.54880", "10.0.0.3.5001", 2),
("05:49:56.604899", "10.0.0.2.54880", "10.0.0.3.5001", 2),
("05:49:56.604900", "10.0.0.2.54880", "10.0.0.3.5001", 2),
("05:49:56.604899", "10.0.0.2.54880", "10.0.0.3.5001", 2),
("05:49:56.604900", "10.0.0.2.54880", "10.0.0.3.5001", 2),
("05:49:56.604899", "10.0.0.2.54880", "10.0.0.3.5001", 2),
("05:49:56.604900", "10.0.0.2.54880", "10.0.0.3.5001", 2),
("05:49:56.604899", "10.0.0.2.54880", "10.0.0.3.5001", 2),
("05:49:56.604900", "10.0.0.2.54880", "10.0.0.3.5001", 2),
("05:49:56.604899", "10.0.0.2.54880", "10.0.0.3.5001", 2),
("05:49:56.604900", "10.0.0.2.54880", "10.0.0.3.5001", 2),
("05:49:56.604899", "10.0.0.2.54880", "10.0.0.3.5001", 2),
("05:49:56.604908", "10.0.0.3.5001", "10.0.0.2.54880", 2),
("05:49:56.604908", "10.0.0.3.5001", "10.0.0.2.54880", 2),
("05:49:56.604908", "10.0.0.3.5001", "10.0.0.2.54880", 2),
("05:49:56.604908", "10.0.0.3.5001", "10.0.0.2.54880", 2),
("05:49:56.604908", "10.0.0.3.5001", "10.0.0.2.54880", 2),
("05:49:56.604908", "10.0.0.3.5001", "10.0.0.2.54880", 2),
("05:49:56.604908", "10.0.0.3.5001", "10.0.0.2.54880", 2)).toDF("column0", "column1", "column2", "label")
scala> input.show
+---------------+--------------+--------------+-----+
| column0| column1| column2|label|
+---------------+--------------+--------------+-----+
|05:49:56.604899|10.0.0.2.54880| 10.0.0.3.5001| 2|
|05:49:56.604900|10.0.0.2.54880| 10.0.0.3.5001| 2|
|05:49:56.604899|10.0.0.2.54880| 10.0.0.3.5001| 2|
|05:49:56.604900|10.0.0.2.54880| 10.0.0.3.5001| 2|
|05:49:56.604899|10.0.0.2.54880| 10.0.0.3.5001| 2|
|05:49:56.604900|10.0.0.2.54880| 10.0.0.3.5001| 2|
|05:49:56.604899|10.0.0.2.54880| 10.0.0.3.5001| 2|
|05:49:56.604900|10.0.0.2.54880| 10.0.0.3.5001| 2|
|05:49:56.604899|10.0.0.2.54880| 10.0.0.3.5001| 2|
|05:49:56.604900|10.0.0.2.54880| 10.0.0.3.5001| 2|
|05:49:56.604899|10.0.0.2.54880| 10.0.0.3.5001| 2|
|05:49:56.604900|10.0.0.2.54880| 10.0.0.3.5001| 2|
|05:49:56.604899|10.0.0.2.54880| 10.0.0.3.5001| 2|
|05:49:56.604908| 10.0.0.3.5001|10.0.0.2.54880| 2|
|05:49:56.604908| 10.0.0.3.5001|10.0.0.2.54880| 2|
|05:49:56.604908| 10.0.0.3.5001|10.0.0.2.54880| 2|
|05:49:56.604908| 10.0.0.3.5001|10.0.0.2.54880| 2|
|05:49:56.604908| 10.0.0.3.5001|10.0.0.2.54880| 2|
|05:49:56.604908| 10.0.0.3.5001|10.0.0.2.54880| 2|
|05:49:56.604908| 10.0.0.3.5001|10.0.0.2.54880| 2|
+---------------+--------------+--------------+-----+
scala> solution.show
+---------------+--------------+--------------+-----+-----+
| column0| column1| column2|label|count|
+---------------+--------------+--------------+-----+-----+
|05:49:56.604899|10.0.0.2.54880| 10.0.0.3.5001| 2| 13|
|05:49:56.604900|10.0.0.2.54880| 10.0.0.3.5001| 2| 13|
|05:49:56.604899|10.0.0.2.54880| 10.0.0.3.5001| 2| 13|
|05:49:56.604900|10.0.0.2.54880| 10.0.0.3.5001| 2| 13|
|05:49:56.604899|10.0.0.2.54880| 10.0.0.3.5001| 2| 13|
|05:49:56.604900|10.0.0.2.54880| 10.0.0.3.5001| 2| 13|
|05:49:56.604899|10.0.0.2.54880| 10.0.0.3.5001| 2| 13|
|05:49:56.604900|10.0.0.2.54880| 10.0.0.3.5001| 2| 13|
|05:49:56.604899|10.0.0.2.54880| 10.0.0.3.5001| 2| 13|
|05:49:56.604900|10.0.0.2.54880| 10.0.0.3.5001| 2| 13|
|05:49:56.604899|10.0.0.2.54880| 10.0.0.3.5001| 2| 13|
|05:49:56.604900|10.0.0.2.54880| 10.0.0.3.5001| 2| 13|
|05:49:56.604899|10.0.0.2.54880| 10.0.0.3.5001| 2| 13|
|05:49:56.604908| 10.0.0.3.5001|10.0.0.2.54880| 2| 7|
|05:49:56.604908| 10.0.0.3.5001|10.0.0.2.54880| 2| 7|
|05:49:56.604908| 10.0.0.3.5001|10.0.0.2.54880| 2| 7|
|05:49:56.604908| 10.0.0.3.5001|10.0.0.2.54880| 2| 7|
|05:49:56.604908| 10.0.0.3.5001|10.0.0.2.54880| 2| 7|
|05:49:56.604908| 10.0.0.3.5001|10.0.0.2.54880| 2| 7|
|05:49:56.604908| 10.0.0.3.5001|10.0.0.2.54880| 2| 7|
+---------------+--------------+--------------+-----+-----+