Spark Structured Streaming

for Developers 1 Day

Apache Spark 2.2

@jaceklaskowski / StackOverflow / GitHub
Books: Mastering Apache Spark / Spark Structured Streaming

https://bit.ly/mastering-apache-spark

©Jacek Laskowski 2017 / @jaceklaskowski / jacek@japila.pl

https://bit.ly/spark-structured-streaming

©Jacek Laskowski 2017 / @jaceklaskowski / jacek@japila.pl

StackOverflow

©Jacek Laskowski 2017 / @jaceklaskowski / jacek@japila.pl

Agenda (1 of 2)


  Exercise: Batch Processing JSON Datasets (Spark SQL)

  Introduction to Spark Structured Streaming

  Streaming Aggregations with groupBy

  Lunch Break (12:00pm)

©Jacek Laskowski 2017 / @jaceklaskowski / jacek@japila.pl

Agenda (2 of 2)


  Fault Tolerance and Checkpointing

  Monitoring Streaming Queries (with StreamingQueryListeners)

  Structured Streaming's Internals

  Exercise: Developing Custom Streaming Sources

©Jacek Laskowski 2017 / @jaceklaskowski / jacek@japila.pl

Prerequisities

Be prepared to get the most out of the workshop

©Jacek Laskowski 2017 / @jaceklaskowski / jacek@japila.pl

Prerequisities / Programming Experience


  Experience with Scala language (or Java or Python)

  Experience with Dataset API and Spark SQL in general

  Experience with basic aggregations (groupBy) and joins

  Familiarity with the command line and spark-shell in particular

©Jacek Laskowski 2017 / @jaceklaskowski / jacek@japila.pl

Prerequisities / To Be Installed


  Apache Spark 2.2.0

  Java Platform, Standard Edition (Java SE) 8

  IntelliJ IDEA Community Edition with Scala plugin

  sbt 1.0.3

  Apache Kafka 1.0.0

©Jacek Laskowski 2017 / @jaceklaskowski / jacek@japila.pl

In-Class Preparations

Make Instructor's Life Slightly Easier. Thanks!

©Jacek Laskowski 2017 / @jaceklaskowski / jacek@japila.pl

Introduce Yourself


  First name

  What do you expect from the workshop?

©Jacek Laskowski 2017 / @jaceklaskowski / jacek@japila.pl

Addendum


  Write down your name on paper and put it in front of you (stick to your laptop?)

  What time do you prefer for lunch? 12pm or 1pm?

©Jacek Laskowski 2017 / @jaceklaskowski / jacek@japila.pl

Questions?


  Read Spark Structured Streaming gitbook

  Read Mastering Apache Spark 2 gitbook

  Follow @jaceklaskowski on twitter

  Upvote my questions and answers on StackOverflow