Apache Spark™ and Scala
for Experienced Oracle and SQL Developers 5 Days

@jaceklaskowski / StackOverflow / GitHub
Books: Mastering Apache Spark / Spark Structured Streaming

  • Jacek Laskowski is an independent consultant
  • Specializing in Spark, Kafka, Kafka Streams, Scala
  • Development | Consulting | Training
  • Among contributors to Spark (since 1.6.0)
  • Contact me at jacek@japila.pl
  • Follow @JacekLaskowski on twitter
    for more #ApacheSpark

Jacek is best known by his Gitbooks:
  1. Mastering Apache Spark
  2. Mastering Spark SQL
  3. Spark Structured Streaming
  4. Mastering Kafka Streams
  5. Apache Kafka Notebook

Agenda

  • Day 1 — Just Enough Scala (with IntelliJ IDEA)
  • Day 2 — Foundations of Spark SQL
  • Day 3 — Aggregations and Joins
  • Day 4 — Advanced Apache Spark and Monitoring
  • Day 5 — Advanced Spark SQL and Spark MLlib

Day 1 — Just Enough Scala
With IntelliJ IDEA

  1. Scala — Just Enough to Develop Spark Applications
    • Getting familiar with the syntax and Scala REPL
    • sbt console
  2. My First Scala Standalone Application
    • IntelliJ IDEA, sbt package and spark-submit
  3. Running Scala applications using java -jar
  4. Example: Changing column names in Dataset

Prerequisities

Be prepared to get the most out of the workshop

Prerequisities / Programming Experience

Some programming experience using modern programming language, e.g. Scala, Python, Java, F#

Prerequisities / To Be Installed

  1. Java Platform, Standard Edition (Java SE) 8
  2. IntelliJ IDEA Community Edition with Scala plugin
  3. sbt

Prerequisities / To Be Downloaded

  1. The latest version of Apache Spark

In-Class Preparations

Make Instructor's Life Slightly Easier. Thanks!

Introduce Yourself

  1. First name
  2. What do you expect from the workshop?
  3. Where do you want to be with Spark after 5 days?

Addendum

  1. Write down your name on paper and put it in front of you (stick to your laptop?)
  2. Is lunch at 12:45pm OK?