Write a structured query (using spark-shell
or Databricks Community Edition) that loads a dataset with a proper schema with timestamp and prints out the rows to the standard output:
2019-07-22 00:10:15,030|10.29.2.6|
2019-07-22 00:10:15,334|10.1.198.41|
2019-07-22 00:10:15,400|10.1.198.41|
2019-07-22 00:10:15,511|10.1.198.41|
2019-07-22 00:10:16,911|10.1.198.41|
Protip™: Use CSV data source
Module: Spark SQL
Duration: 30 mins
scala> solution.printSchema
root
|-- dateTime: timestamp (nullable = true)
|-- IP: string (nullable = true)
scala> solution.show(truncate = false)
+-----------------------+-----------+
|dateTime |IP |
+-----------------------+-----------+
|2019-07-22 00:10:15.03 |10.29.2.6 |
|2019-07-22 00:10:15.334|10.1.198.41|
|2019-07-22 00:10:15.4 |10.1.198.41|
|2019-07-22 00:10:15.511|10.1.198.41|
|2019-07-22 00:10:16.911|10.1.198.41|
+-----------------------+-----------+
NOTE: The types!