Spark 2 / Scala Workshop 4 Days

@jaceklaskowski / StackOverflow / GitHub / Mastering Apache Spark 2

https://github.com/jaceklaskowski

https://bit.ly/mastering-apache-spark

Among contributors to Apache Spark 1.6

Among contributors to Apache Spark 2

Among contributors to Apache Spark 2.1

Ranked #96 in Spark contributors

http://stackoverflow.com/users/1305344/jacek-laskowski

https://twitter.com/jaceklaskowski

Goal

Creating prototypes in Databricks Cloud, and switch to developing full-blown Scala applications managed by sbt

Agenda

  • Scala Crash Course
    • val, def, case class, class, object, functions
    • Initializing, setting options, start/stop SparkContext
    • Creating SparkSession
  • From DataFrames to RDDs
  • Transforming RDDs
    • Commonly-used APIs and lots of practical examples
  • Creating DataFrames from RDDs
  • Caching and Persistence
  • Manipulating DataFrames
  • Developing Scala applications using sbt
  • Spark Tools - spark-shell, spark-submit and web UI

Prerequisities (1 of 3)

  1. Some programming experience using modern programming language, e.g. Scala, Python, Java, F#

Prerequisities (2 of 3)

  1. Databricks Cloud Community Edition account
  2. Installed

Prerequisities (3 of 3)

  1. Downloaded

Questions?