Skip to content

Day 7 / Apr 12 (Tue)

Continuing the journey into the land of Spark Structured Streaming.

Morning Exercise

Exercise: Your First Standalone Structured Streaming Application

  1. Create a brand new project in IntelliJ IDEA
  2. An input directory to read files from should be defined on command line (args(0))
    1. (advanced/optional) Use scopt for the input directory
  3. Run the application from command line using spark-submit

Theory

  1. Spark Structured Streaming

Exercises

  1. Exercise: Finding Most Common Non-null Prefix per Group (Occurences)
  2. Exercise: Finding First Non-Null Value per Group

Homework

  1. Read the scaladoc of org.apache.spark.sql.streaming.StreamingQuery
Back to top