Debugging Spark

Apache Spark 2 / Spark SQL

@jaceklaskowski / StackOverflow / GitHub
Books: Mastering Apache Spark / Spark Structured Streaming

Agenda

  1. Wikipedia on Debugging
  2. Debugging Tools and Techniques
  3. Live Debugging Session

Wikipedia on Debugging

Debugging is the process of finding and resolving of defects that prevent correct operation of computer software or a system.
from Debugging in Wikipedia, the free encyclopedia

Debugging - Numerous Facets

Numerous books have been written about debugging, as it involves numerous aspects, including interactive debugging, control flow, integration testing, log files, monitoring (application, system), memory dumps, profiling, Statistical Process Control, and special design tactics to improve detection while simplifying changes.
from Debugging in Wikipedia, the free encyclopedia

Debugging Tools and Techniques

  1. IntelliJ IDEA's Debugger
  2. Loggers (conf/log4j.properties)
  3. ScalaTest - tests over debugging
  4. Debugging your Application (in Spark docs)
  5. Spark application's web UI, esp. SQL tab
  6. Explaining Logical and Physical Plans — explain Operator
  7. Debugging Query Execution (using debug package)

Live Debugging Session

  1. Debugging Spark (MLlib) Application in IntelliJ IDEA
  2. spark-shell and conf/log4j.properties
  3. "Debugging" ML Pipelines

Recap

  1. Wikipedia on Debugging
  2. Debugging Tools and Techniques
  3. Live Debugging Session

Questions?