Monitoring Spark using SparkListeners

Apache Spark 2.4.1 / Spark Core

@jaceklaskowski / StackOverflow / GitHub
The "Internals" Books: Apache Spark / Spark SQL / Spark Structured Streaming

Spark Listeners

  1. SparkListener is a developer API to intercept events from the Spark scheduler
  2. Switch to The Internals of Apache Spark

LiveListenerBus / spark.extraListeners

  1. LiveListenerBus is a single-JVM SparkListenerBus for Spark events
  2. spark.extraListeners are Spark listeners that should be registered when SparkContext starts
  3. Switch to The Internals of Apache Spark

EventLoggingListener and History Server

  1. EventLoggingListener is a SparkListener that logs JSON-encoded events to a file.
  2. History Server is a web interface for completed and running Spark applications.
  3. Switch to The Internals of Apache Spark

StatsReportListener — Logging Summary Statistics

  1. StatsReportListener is a SparkListener that logs summary statistics when each stage completes.
  2. Switch to The Internals of Apache Spark

Exercise: Developing Custom SparkListener

  1. Creating Scala/sbt project in IntelliJ IDEA
  2. Creating a Scala class - CustomSparkListener
  3. Creating deployable package using sbt package
  4. Activating SparkListener in Spark shell using spark.extraListeners
  5. Switch to The Internals of Apache Spark