messages.writeStream
.format("console")
// define checkpoint location
.option("checkpointLocation", "path/to/checkpoint/dir")
.start
val q: StreamingQuery = spark.readStream...start
q.id // <-- same across restarts with checkpointing enabled
q.runId // <-- always unique
Write a streaming query with checkpointLocation option
What happens when you stop the query and start over with checkpointing enabled?
What happens when you stop the query and start over without checkpointing?
What are the query identifiers?
What are the directory structure and files in the checkpoint location?
What is the format of the files?
Read Spark Structured Streaming gitbook
Read Mastering Apache Spark 2 gitbook
Follow @jaceklaskowski on twitter
Upvote my questions and answers on StackOverflow