Apache Spark 2 Workshop 2 Days / Ljubljana

@jaceklaskowski / StackOverflow / GitHub / Mastering Apache Spark 2

https://github.com/jaceklaskowski

http://bit.ly/mastering-apache-spark

Among contributors to Apache Spark 1.6

Among contributors to Apache Spark 2

http://stackoverflow.com/users/1305344/jacek-laskowski

https://twitter.com/jaceklaskowski

Agenda - Day 1

  1. WARM-UP: Developing Command-Line Spark Application
    • Using IntelliJ IDEA, Scala, sbt and spark-submit
  2. SparkSession, Dataset and Encoders
  3. Aggregations, Join and Window Operators
  4. Catalyst Query Optimizer

Agenda - Day 2

  1. Spark MLlib's ML Pipelines
  2. Structured Streaming (and Apache Kafka)
  3. Monitoring Using SparkListeners
  4. Spark Streaming's Stateful Operators (e.g. mapWithState)
  5. Kafka Integration using Direct API
  6. Spark Thrift Server - Spark's JDBC & ODBC Interface

Prerequisities

  1. Some programming experience using modern programming language (preferably on JVM)
    • Java, Python, Scala, C#
  2. Installed
  3. Downloaded
  4. Willingness to ask PLENTY of questions

Questions?