Fault Tolerance

(and Checkpointing)

Spark Structured Streaming

@jaceklaskowski / StackOverflow / GitHub
Books: Mastering Apache Spark / Spark Structured Streaming

Checkpointing

  1. Checkpointing is a process of storing query metadata (i.e. offsets) so that it can recover from upon failure or planned downtime (e.g. system maintenance or application upgrade)
  2. Allows for failure-free execution
  3. Local checkpoints created and deleted automatically
  4. External checkpoints saved to persistent storage and are never deleted

Checkpoint Location

  1. Path to a directory where the application will write checkpoint information for fault-tolerance
  2. HDFS-compatible fault-tolerant file system
  3. checkpointLocation option
  4. 
      messages.writeStream
        .format("console")
        // define checkpoint location
        .option("checkpointLocation", "path/to/checkpoint/dir")
        .start
                

StreamingQuery Identifiers

  1. StreamingQuery.id remains the same across restarts with checkpointing enabled
  2. StreamingQuery.runId always generated every run (regardless of checkpointing status)
  3. 
    val q: StreamingQuery = spark.readStream...start
    q.id    // <-- same across restarts with checkpointing enabled
    q.runId // <-- always unique
                
  4. With checkpointing enabled, every time a query is restarted, it will have the same id but different runIds.

Exercise: Fault-Tolerant Streaming Query


  Write a streaming query with checkpointLocation option

  What happens when you stop the query and start over with checkpointing enabled?

  What happens when you stop the query and start over without checkpointing?

  What are the query identifiers?

  What are the directory structure and files in the checkpoint location?

  What is the format of the files?

©Jacek Laskowski 2017 / @jaceklaskowski / jacek@japila.pl

Questions?


  Read Spark Structured Streaming gitbook

  Read Mastering Apache Spark 2 gitbook

  Follow @jaceklaskowski on twitter

  Upvote my questions and answers on StackOverflow