Day 7 / Apr 12 (Tue)¶
Continuing the journey into the land of Spark Structured Streaming.
Morning Exercise¶
Exercise: Your First Standalone Structured Streaming Application
- Create a brand new project in IntelliJ IDEA
- An input directory to read files from should be defined on command line (
args(0)
)- (advanced/optional) Use scopt for the input directory
- Run the application from command line using
spark-submit
Theory¶
Exercises¶
- Exercise: Finding Most Common Non-null Prefix per Group (Occurences)
- Exercise: Finding First Non-Null Value per Group
Homework¶
- Read the scaladoc of org.apache.spark.sql.streaming.StreamingQuery