ExecutorPodsLifecycleManager¶
Creating Instance¶
ExecutorPodsLifecycleManager takes the following to be created:
-
SparkConf -
KubernetesClient - ExecutorPodsSnapshotsStore
- Guava
Cache
ExecutorPodsLifecycleManager is created when KubernetesClusterManager is requested for a SchedulerBackend (and creates a KubernetesClusterSchedulerBackend).
Configuration Properties¶
spark.kubernetes.executor.eventProcessingInterval¶
ExecutorPodsLifecycleManager uses the spark.kubernetes.executor.eventProcessingInterval configuration property when started to register a new subscriber for how often to...FIXME
spark.kubernetes.executor.deleteOnTermination¶
ExecutorPodsLifecycleManager uses the spark.kubernetes.executor.deleteOnTermination configuration property for onFinalNonDeletedState.
Missing Pod Timeout¶
ExecutorPodsLifecycleManager defines Missing Pod Timeout based on the spark.kubernetes.executor.missingPodDetectDelta configuration property.
ExecutorPodsLifecycleManager uses the timeout to detect lost executor pods when handling executor pods snapshots.
Starting¶
start(
schedulerBackend: KubernetesClusterSchedulerBackend): Unit
start requests the ExecutorPodsSnapshotsStore to add a subscriber to intercept state changes in executor pods.
start is used when KubernetesClusterSchedulerBackend is started.
Processing Executor Pods Snapshots¶
onNewSnapshots(
schedulerBackend: KubernetesClusterSchedulerBackend,
snapshots: Seq[ExecutorPodsSnapshot]): Unit
onNewSnapshots creates an empty execIdsRemovedInThisRound collection of executors to be removed.
onNewSnapshots walks over the input ExecutorPodsSnapshots and branches off based on ExecutorPodState:
-
For
PodDeleted,onNewSnapshotsprints out the following DEBUG message to the logs:Snapshot reported deleted executor with id [execId], pod name [state.pod.getMetadata.getName]onNewSnapshotsremoveExecutorFromSpark and adds the executor ID to theexecIdsRemovedInThisRoundlocal collection. -
For
PodFailed,onNewSnapshotsprints out the following DEBUG message to the logs:Snapshot reported failed executor with id [execId], pod name [state.pod.getMetadata.getName]onNewSnapshotsonFinalNonDeletedState with theexecIdsRemovedInThisRoundlocal collection. -
For
PodSucceeded,onNewSnapshotsrequests the inputKubernetesClusterSchedulerBackendto isExecutorActive. If so,onNewSnapshotsprints out the following INFO message to the logs:Snapshot reported succeeded executor with id [execId], even though the application has not requested for it to be removed.Otherwise,
onNewSnapshotsprints out the following DEBUG message to the logs:Snapshot reported succeeded executor with id [execId], pod name [state.pod.getMetadata.getName].onNewSnapshotsonFinalNonDeletedState with theexecIdsRemovedInThisRoundlocal collection.
onFinalNonDeletedState¶
onFinalNonDeletedState(
podState: FinalPodState,
execId: Long,
schedulerBackend: KubernetesClusterSchedulerBackend,
deleteFromK8s: Boolean): Boolean
onFinalNonDeletedState removeExecutorFromSpark (and records the deleted return flag to be returned in the end).
With the given deleteFromK8s flag enabled, onFinalNonDeletedState removeExecutorFromK8s.
removeExecutorFromSpark¶
removeExecutorFromSpark(
schedulerBackend: KubernetesClusterSchedulerBackend,
podState: FinalPodState,
execId: Long): Unit
removeExecutorFromSpark...FIXME
removeExecutorFromK8s¶
removeExecutorFromK8s(
execId: Long,
updatedPod: Pod): Unit
removeExecutorFromK8s...FIXME
Logging¶
Enable ALL logging level for org.apache.spark.scheduler.cluster.k8s.ExecutorPodsLifecycleManager logger to see what happens inside.
Add the following line to conf/log4j.properties:
log4j.logger.org.apache.spark.scheduler.cluster.k8s.ExecutorPodsLifecycleManager=ALL
Refer to Logging.