KubernetesClusterManager¶
KubernetesClusterManager is an ExternalClusterManager (Apache Spark) that can create scheduler components for k8s master URLs:
KubernetesClusterManager is registered with Apache Spark using META-INF/services/org.apache.spark.scheduler.ExternalClusterManager service file.
Creating Instance¶
KubernetesClusterManager takes no arguments to be created.
KubernetesClusterManager is created when:
SparkContextis requested for anExternalClusterManager(when requested for a SchedulerBackend and TaskScheduler)
Creating SchedulerBackend¶
createSchedulerBackend(
sc: SparkContext,
masterURL: String,
scheduler: TaskScheduler): SchedulerBackend
createSchedulerBackend is part of the ExternalClusterManager (Apache Spark) abstraction.
createSchedulerBackend creates a KubernetesClusterSchedulerBackend.
Note
createSchedulerBackend assumes that the given TaskScheduler is TaskSchedulerImpl (Apache Spark).
createSchedulerBackend determines four internal values based on the spark.kubernetes.submitInDriver internal configuration property.
| spark.kubernetes.submitInDriver | ||
|---|---|---|
Enabled (true) | Disabled (false) | |
| authConfPrefix | spark.kubernetes.authenticate.driver.mounted | spark.kubernetes.authenticate |
| apiServerUri | spark.kubernetes.driver.master | Master URL with no k8s:// prefix |
| defaultServiceAccountToken | /var/run/secrets/kubernetes.io/serviceaccount/token | |
| defaultServiceAccountCaCrt | /var/run/secrets/kubernetes.io/serviceaccount/ca.crt |
Unless already defined, createSchedulerBackend sets the spark.kubernetes.executor.podNamePrefix configuration properties based on spark.app.name prefix.
createSchedulerBackend creates a KubernetesClient for the Driver client type and the following:
- spark.kubernetes.namespace configuration property
- apiServerUri
- authConfPrefix
- defaultServiceAccountToken
- defaultServiceAccountCaCrt
With spark.kubernetes.executor.podTemplateFile configuration property enabled, createSchedulerBackend loads the pod spec from the pod template file with the optional spark.kubernetes.executor.podTemplateContainerName configuration property.
In the end, createSchedulerBackend creates a KubernetesClusterSchedulerBackend with the following:
-
Java
ScheduledExecutorServicewith kubernetes-executor-maintenance thread name -
ExecutorPodsSnapshotsStoreImpl with a Java
ScheduledExecutorServicewith kubernetes-executor-snapshots-subscribers thread names and 2 threads -
ExecutorPodsPollingSnapshotSource with a Java
ScheduledExecutorServicewith kubernetes-executor-pod-polling-sync thread name
IllegalArgumentException¶
With spark.kubernetes.submitInDriver enabled, createSchedulerBackend asserts that the name of the driver pod is configured (using spark.kubernetes.driver.pod.name configuration property) or else throws an IllegalArgumentException:
If the application is deployed using spark-submit in cluster mode, the driver pod name must be provided.
Creating TaskScheduler¶
createTaskScheduler(
sc: SparkContext,
masterURL: String): TaskScheduler
createTaskScheduler is part of the ExternalClusterManager (Apache Spark) abstraction.
createTaskScheduler creates a TaskSchedulerImpl (Apache Spark).
Initializing Scheduling Components¶
initialize(
scheduler: TaskScheduler,
backend: SchedulerBackend): Unit
initialize is part of the ExternalClusterManager (Apache Spark) abstraction.
initialize requests the given TaskSchedulerImpl (Apache Spark) to initialize with the given SchedulerBackend (Apache Spark).