Exercise: Batch Processing JSON Datasets (Spark SQL)
Introduction to Spark Structured Streaming
Streaming Aggregations with groupBy
Lunch Break (12:00pm)
Fault Tolerance and Checkpointing
Monitoring Streaming Queries (with StreamingQueryListeners)
Structured Streaming's Internals
Exercise: Developing Custom Streaming Sources
Experience with Scala language (or Java or Python)
Experience with Dataset API and Spark SQL in general
Experience with basic aggregations (groupBy) and joins
Familiarity with the command line and spark-shell in particular
Java Platform, Standard Edition (Java SE) 8
IntelliJ IDEA Community Edition with Scala plugin
First name
What do you expect from the workshop?
Write down your name on paper and put it in front of you (stick to your laptop?)
What time do you prefer for lunch? 12pm or 1pm?
Read Spark Structured Streaming gitbook
Read Mastering Apache Spark 2 gitbook
Follow @jaceklaskowski on twitter
Upvote my questions and answers on StackOverflow