Apache Spark™
Advanced for Developers Workshop 5 Days

@jaceklaskowski / StackOverflow / GitHub
The "Internals" Books: Apache Spark / Spark SQL / Spark Structured Streaming

Jacek is best known by the online "Internals" books:

  1. The Internals of Apache Spark
  2. The Internals of Spark SQL
  3. The Internals of Spark Structured Streaming
  4. Mastering Kafka Streams
  5. Mastering Apache Kafka

Jacek is active on StackOverflow

Professional Objectives

  • Advance in solving analytical problems using Spark SQL
  • Explore the recent features of Apache Spark 2.4
  • Deep dive into the internals of Apache Spark and the modules (Spark SQL, Spark Structured Streaming and Spark MLlib)
  • Understand performance tuning of Apache Spark applications and the advanced features of Apache Spark

Training content

  • Anatomy of Spark Core Data Processing Platform
  • Foundations of Spark SQL
  • Internals of Structured Query Execution
  • Standard, User-Defined and User-Defined Aggregate Functions (Spark SQL)
  • Basic, Windowed and Multi-Dimensional Aggregations (Spark SQL)
  • Monitoring Spark Applications Using web UI and SparkListeners
  • Join Optimization with Bucketing (Spark SQL)
  • Stream Processing with Spark Structured Streaming
  • Machine Learning with Spark MLlib

Prerequisities

Be prepared to get the most out of the workshop

Prerequisities / Experience

  1. Hands-on programming experience using Scala
  2. Experience developing Spark applications

Prerequisities / To Be Installed

In-Class Preparations

Make Instructor's Life Slightly Easier. Thanks!

Introduce Yourself

  1. First name
  2. What's your experience with Spark?
  3. Any production experience with Spark?
  4. What do you expect from the workshop?
  5. Where do you want to be with Spark after 5 days?

Addendum

  1. Put your name on paper in front of you
    • Stick to your laptop if possible
    • Use LARGE fonts
  2. 1-hour lunch break at 12:45pm