Spark 2.0 / Scala Workshop agenda

Large-Scale Big Data Analytics

@jaceklaskowski / StackOverflow / GitHub / Mastering Apache Spark 2

https://github.com/jaceklaskowski

http://bit.ly/mastering-apache-spark

http://stackoverflow.com/users/1305344/jacek-laskowski

https://www.quora.com/profile/Jacek-Laskowski

https://medium.com/@jaceklaskowski

https://twitter.com/jaceklaskowski

Among contributors to Apache Spark 1.6

Among contributors to Apache Spark 2

Agenda

  • Day 1. The Elements of Apache Spark 2
    • With Scala 2.11.8
  • Day 2. Spark SQL
  • Day 3. Spark MLlib

Modules - Day 1 (1 of 2)

  • The Elements of Apache Spark (aka Why Spark)
  • Tools Intro - IntelliJ IDEA and Scala Worksheet
  • Introduction to Scala (aka Why Scala)
  • Spark Setup and Your First Spark Application
  • Exercise: My First Scala Application (using IntelliJ IDEA)

Modules - Day 1 (2 of 2)

  • Introduction to sbt and sbt-assembly
  • Exercise: Running Scala Application from Command Line
  • Introduction to Spark SQL using spark-shell and Spark's web U
  • Exercise: My First Spark SQL Application (using IntelliJ IDEA)
  • Introduction to spark-submit and run-example
  • Exercise: spark-submit Your Spark App / run-example SparkPi

Modules - Day 2

  • Developing Spark SQL Applications using Datasets
  • More detailed agenda coming...

Modules - Day 3

  • Developing Machine Learning Pipelines using Spark MLlib
  • More detailed agenda coming...

Prerequisities

Questions?