Skip to content

KubernetesClusterManager

KubernetesClusterManager is an ExternalClusterManager (Apache Spark) that can create scheduler components for k8s master URLs:

KubernetesClusterManager is registered with Apache Spark using META-INF/services/org.apache.spark.scheduler.ExternalClusterManager service file.

Creating Instance

KubernetesClusterManager takes no arguments to be created.

KubernetesClusterManager is created when:

Creating SchedulerBackend

createSchedulerBackend(
  sc: SparkContext,
  masterURL: String,
  scheduler: TaskScheduler): SchedulerBackend

createSchedulerBackend is part of the ExternalClusterManager (Apache Spark) abstraction.

createSchedulerBackend creates a KubernetesClusterSchedulerBackend.

Note

createSchedulerBackend assumes that the given TaskScheduler is TaskSchedulerImpl (Apache Spark).

createSchedulerBackend determines four internal values based on the spark.kubernetes.submitInDriver internal configuration property.

  spark.kubernetes.submitInDriver  
  Enabled (true) Disabled (false)
authConfPrefix spark.kubernetes.authenticate.driver.mounted spark.kubernetes.authenticate
apiServerUri spark.kubernetes.driver.master Master URL with no k8s:// prefix
defaultServiceAccountToken /var/run/secrets/kubernetes.io/serviceaccount/token  
defaultServiceAccountCaCrt /var/run/secrets/kubernetes.io/serviceaccount/ca.crt  

Unless already defined, createSchedulerBackend sets the spark.kubernetes.executor.podNamePrefix configuration properties based on spark.app.name prefix.

createSchedulerBackend creates a KubernetesClient for the Driver client type and the following:

With spark.kubernetes.executor.podTemplateFile configuration property enabled, createSchedulerBackend loads the pod spec from the pod template file with the optional spark.kubernetes.executor.podTemplateContainerName configuration property.

In the end, createSchedulerBackend creates a KubernetesClusterSchedulerBackend with the following:

IllegalArgumentException

With spark.kubernetes.submitInDriver enabled, createSchedulerBackend asserts that the name of the driver pod is configured (using spark.kubernetes.driver.pod.name configuration property) or else throws an IllegalArgumentException:

If the application is deployed using spark-submit in cluster mode, the driver pod name must be provided.

Creating TaskScheduler

createTaskScheduler(
  sc: SparkContext,
  masterURL: String): TaskScheduler

createTaskScheduler is part of the ExternalClusterManager (Apache Spark) abstraction.

createTaskScheduler creates a TaskSchedulerImpl (Apache Spark).

Initializing Scheduling Components

initialize(
  scheduler: TaskScheduler,
  backend: SchedulerBackend): Unit

initialize is part of the ExternalClusterManager (Apache Spark) abstraction.

initialize requests the given TaskSchedulerImpl (Apache Spark) to initialize with the given SchedulerBackend (Apache Spark).


Last update: 2021-01-27