Spark SQL 2 / Scala Workshop 1 Day

Using Databricks Cloud

@jaceklaskowski / StackOverflow / GitHub / Mastering Apache Spark 2

https://github.com/jaceklaskowski

https://bit.ly/mastering-apache-spark

Among contributors to Apache Spark 1.6

Among contributors to Apache Spark 2

Among contributors to Apache Spark 2.1

Ranked #96 in Spark contributors

http://stackoverflow.com/users/1305344/jacek-laskowski

https://twitter.com/jaceklaskowski

Agenda (1 of 2)

  • Understanding Spark SQL
  • Just enough Scala (to learn Spark SQL in notebooks)
  • Data access (csv, json, jdbc, parquet)
  • Manipulating DataFrames
  • Creating temp tables
  • From declarative SQL to using Spark SQL programmatically

Agenda (2 of 2)

  • Standard Functions and UDFs
  • Aggregations (group by, sum, count)
  • Complex Aggregations (windows, UDAFs)
  • JOINs
  • Optimizing Queries
  • Accessing Spark Docs for Scala

Prerequisities

  1. Some programming experience using modern programming language, e.g. Scala, Python, Java, F#
  2. Databricks Cloud Community Edition account
  3. Willingness to ask PLENTY of questions

Questions?