Monitoring
Streaming Queries

Spark Structured Streaming

@jaceklaskowski / StackOverflow / GitHub
Books: Mastering Apache Spark / Spark Structured Streaming

StreamingQueryListener

  1. StreamingQueryListener is the contract for listeners that want to be notified about the life cycle events of streaming queries, i.e. start, progress and termination of a streaming query

package org.apache.spark.sql.streaming

abstract class StreamingQueryListener {
  def onQueryStarted(event: QueryStartedEvent): Unit
  def onQueryProgress(event: QueryProgressEvent): Unit
  def onQueryTerminated(event: QueryTerminatedEvent): Unit
}
          

onQueryStarted Callback

    
      def onQueryStarted(event: QueryStartedEvent): Unit
                
  1. onQueryStarted triggered right after StreamExecution has started (running streaming batches)
  2. QueryStartedEvent holds id, runId and name

onQueryProgress Callback

    
      def onQueryProgress(event: QueryProgressEvent): Unit
                  
  1. onQueryProgress triggered when ProgressReporter reports query progress (which is right after StreamExecution has finished a batch trigger)
  2. QueryProgressEvent holds StreamingQueryProgress(on next slide)

StreamingQueryProgress

  1. StreamingQueryProgress holds information about the progress of a streaming query
  2. StreamingQuery.lastProgress is the most recent StreamingQueryProgress update
  3. StreamingQuery.recentProgress is the most recent collection of StreamingQueryProgress updates

onQueryTerminated Callback

    
      def onQueryTerminated(event: QueryTerminatedEvent): Unit
                
  1. onQueryTerminated triggered right before StreamExecution finishes running streaming batches (due to a stop or an exception)
  2. QueryTerminatedEvent holds id, runId and the exception that led to the termination

Registering StreamingQueryListener — addListener Method

    
      val queryListener: StreamingQueryListener = ...
      spark.streams.addListener(queryListener)
                  

De-registering StreamingQueryListener — removeListener Method

    
      val queryListener: StreamingQueryListener = ...
      spark.streams.removeListener(queryListener)
                  

Exercise: Custom StreamingQueryListener


  Use StreamingQueryListener to implement your own listener

  Observe how your listener intercepts the lifecycle events

©Jacek Laskowski 2017 / @jaceklaskowski / jacek@japila.pl

Questions?


  Read Spark Structured Streaming gitbook

  Read Mastering Apache Spark 2 gitbook

  Follow @jaceklaskowski on twitter

  Upvote my questions and answers on StackOverflow