Spark and Cluster Managers

Apache Spark 2.4.1 / Spark Core

@jaceklaskowski / StackOverflow / GitHub
The "Internals" Books: Apache Spark / Spark SQL / Spark Structured Streaming

Agenda

  1. Spark Application = The Driver + Executors
  2. Master URL
  3. Deploy Mode
  4. spark-submit Command-Line Tool
  5. Spark Standalone
  6. ExternalClusterManager - Pluggable Cluster Managers
  7. Spark on YARN
  8. Spark on Mesos

Spark Application = The Driver + Executors

  1. A Spark application consists of the driver and executors (and your code, too!)
  2. Switch to The Internals of Apache Spark

Master URL

  1. Master URL points at the cluster manager to deploy your Spark application to.
  2. Switch to The Internals of Apache Spark

Deploy Mode

  1. Deploy mode specifies the location of where the driver executes when deployed to a deployment environment
  2. Switch to The Internals of Apache Spark

spark-submit Command-Line Tool

  1. spark-submit shell script allows you to manage your Spark applications.
  2. Switch to The Internals of Apache Spark

Spark Standalone

  1. Spark Standalone is Spark's own built-in clustered environment.
  2. Switch to The Internals of Apache Spark

ExternalClusterManager — Pluggable Cluster Managers

  1. ExternalClusterManager is a contract for pluggable cluster managers.
  2. Switch to The Internals of Apache Spark

Spark on YARN (1 of 2)

  1. Hadoop YARN is the Resource Negotiator from Apache Hadoop project.
  2. Switch to The Internals of Apache Spark

Spark on YARN (2 of 2)

Apache Hadoop YARN From the official documentation of Apache Hadoop YARN

Spark on Mesos (1 of 2)

  1. Apache Mesos is a distributed systems kernel with API's for resource management and scheduling across entire datacenter and cloud environments.
  2. Switch to The Internals of Apache Spark

Spark on Mesos (2 of 2)

Apache Mesos From the official documentation of Apache Mesos

Recap

  1. Spark Application = The Driver + Executors
  2. Master URL
  3. Deploy Mode
  4. spark-submit Command-Line Tool
  5. Spark Standalone
  6. ExternalClusterManager - Pluggable Cluster Managers
  7. Spark on YARN
  8. Spark on Mesos