Skip to content

Demo: Running Spark Application on minikube

This demo shows how to deploy a Spark application to Kubernetes (using minikube).

Start Cluster

Unless already started, start minikube.

minikube start

Build Spark Application Image

Make sure you've got a Spark image available in minikube's Docker registry.

Point the shell to minikube's Docker daemon and make sure there is the Spark image (that your Spark application project uses).

eval $(minikube -p minikube docker-env)

List the Spark image.

docker images spark
REPOSITORY   TAG          IMAGE ID       CREATED             SIZE
spark        v3.1.1-rc2   e64950545e8f   About an hour ago   509MB

Use this image in your Spark application:

FROM spark:v3.1.1-rc2

In your Spark application project execute the command to build and push a Docker image to minikube's Docker repository.

sbt clean 'set Docker/dockerRepository in `meetup-spark-app` := None' meetup-spark-app/docker:publishLocal

List the images and make sure that the image of your Spark application project is available.

docker images 'meetup*'
REPOSITORY         TAG       IMAGE ID       CREATED          SIZE
meetup-spark-app   0.1.0     3a867debc6c0   11 seconds ago   524MB

docker image inspect

Use docker image inspect command to display detailed information on the Spark application image.

docker image inspect meetup-spark-app:0.1.0

docker image history

Use docker image history command to show the history of the Spark application image.

docker image history meetup-spark-app:0.1.0

Create Kubernetes Resources

Create required Kubernetes resources to run a Spark application.

Spark official documentation

Learn more from the Spark official documentation.

A namespace is optional, but the service account and the cluster role binding with proper permissions would lead to the following exception message:

Forbidden!Configured service account doesn't have access. Service account may have been revoked.

Declaratively

Use the following rbac.yml file.

apiVersion: v1
kind: Namespace
metadata:
  name: spark-demo
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: spark
  namespace: spark-demo
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: spark-role
  namespace: spark-demo
subjects:
  - kind: ServiceAccount
    name: spark
    namespace: spark-demo
roleRef:
  kind: ClusterRole
  name: edit
  apiGroup: rbac.authorization.k8s.io
---

Create the resources in the Kubernetes cluster.

k create -f k8s/rbac.yml

Tip

With declarative approach (using rbac.yml) cleaning up becomes as simple as k delete -f rbac.yml.

Imperatively

k create ns spark-demo
k create serviceaccount spark -n spark-demo
k create clusterrolebinding spark-role \
  --clusterrole edit \
  --serviceaccount spark-demo:spark \
  -n spark-demo

Submit Spark Application to minikube

cd $SPARK_HOME
K8S_SERVER=$(k config view --output=jsonpath='{.clusters[].cluster.server}')
export POD_NAME=meetup-spark-app
export IMAGE_NAME=$POD_NAME:0.1.0

Please note the configuration properties (some not really necessary but make the demo easier to guide you through, e.g. spark.kubernetes.driver.pod.name).

./bin/spark-submit \
  --master k8s://$K8S_SERVER \
  --deploy-mode cluster \
  --name $POD_NAME \
  --class meetup.SparkApp \
  --conf spark.kubernetes.container.image=$IMAGE_NAME \
  --conf spark.kubernetes.driver.pod.name=$POD_NAME \
  --conf spark.kubernetes.context=minikube \
  --conf spark.kubernetes.namespace=spark-demo \
  --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
  --verbose \
  local:///opt/spark/jars/meetup.meetup-spark-app-0.1.0.jar THIS_STOPS_THE_DRIVER

If all went fine you should soon see termination reason: Completed message.

21/02/09 14:14:30 INFO LoggingPodStatusWatcherImpl: State changed, new state:
     pod name: meetup-spark-app
     namespace: spark-demo
     labels: spark-app-selector -> spark-e4f6628f5c384a4dbdddf4a0b51c3fbe, spark-role -> driver
     pod uid: 5f0de3fd-f366-4c05-9866-d51c4b5dfc93
     creation time: 2021-02-09T13:14:20Z
     service account name: spark
     volumes: spark-local-dir-1, spark-conf-volume-driver, spark-token-hqc6k
     node name: minikube
     start time: 2021-02-09T13:14:20Z
     phase: Succeeded
     container status:
         container name: spark-kubernetes-driver
         container image: meetup-spark-app:0.1.0
         container state: terminated
         container started at: 2021-02-09T13:14:21Z
         container finished at: 2021-02-09T13:14:30Z
         exit code: 0
         termination reason: Completed
21/02/09 14:14:30 INFO LoggingPodStatusWatcherImpl: Application status for spark-e4f6628f5c384a4dbdddf4a0b51c3fbe (phase: Succeeded)
21/02/09 14:14:30 INFO LoggingPodStatusWatcherImpl: Container final statuses:


     container name: spark-kubernetes-driver
     container image: meetup-spark-app:0.1.0
     container state: terminated
     container started at: 2021-02-09T13:14:21Z
     container finished at: 2021-02-09T13:14:30Z
     exit code: 0
     termination reason: Completed

Accessing web UI

k port-forward $POD_NAME 4040:4040

Accessing Logs

Access the logs of the driver.

k logs -f $POD_NAME

Reviewing Spark Application Configuration (ConfigMap)

k get cm
k describe cm [driverPod]-conf-map

Describe the driver pod and review volumes (.spec.volumes) and volume mounts (.spec.containers[].volumeMounts).

k describe po $POD_NAME
$ k get po $POD_NAME -o=jsonpath='{.spec.volumes}' | jq
[
  {
    "emptyDir": {},
    "name": "spark-local-dir-1"
  },
  {
    "configMap": {
      "defaultMode": 420,
      "name": "spark-docker-example-f76bf776ec818be5-driver-conf-map"
    },
    "name": "spark-conf-volume"
  },
  {
    "name": "spark-token-24krm",
    "secret": {
      "defaultMode": 420,
      "secretName": "spark-token-24krm"
    }
  }
]
$ k get po $POD_NAME -o=jsonpath='{.spec.containers[].volumeMounts}' | jq
[
  {
    "mountPath": "/var/data/spark-b5d0a070-ff9a-41a3-91aa-82059ceba5b0",
    "name": "spark-local-dir-1"
  },
  {
    "mountPath": "/opt/spark/conf",
    "name": "spark-conf-volume"
  },
  {
    "mountPath": "/var/run/secrets/kubernetes.io/serviceaccount",
    "name": "spark-token-24krm",
    "readOnly": true
  }
]

Spark Application Management

K8S_SERVER=$(kubectl config view --output=jsonpath='{.clusters[].cluster.server}')
./bin/spark-submit --status "spark-demo:$POD_NAME" --master k8s://$K8S_SERVER
Application status (driver):
     pod name: meetup-spark-app
     namespace: spark-demo
     labels: spark-app-selector -> spark-0df2be7b2d8d40299e7a406564c9833c, spark-role -> driver
     pod uid: 30a749a3-1060-49a7-b502-4a054ea33d30
     creation time: 2021-02-09T13:18:14Z
     service account name: spark
     volumes: spark-local-dir-1, spark-conf-volume-driver, spark-token-hqc6k
     node name: minikube
     start time: 2021-02-09T13:18:14Z
     phase: Running
     container status:
         container name: spark-kubernetes-driver
         container image: meetup-spark-app:0.1.0
         container state: running
         container started at: 2021-02-09T13:18:15Z

Listing Services

k get services
NAME                                               TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)                      AGE
spark-docker-example-3de43976e3a46fcf-driver-svc   ClusterIP   None         <none>        7078/TCP,7079/TCP,4040/TCP   101s

Clean Up

Clean up the cluster as described in Demo: spark-shell on minikube.

That's it. Congratulations!


Last update: 2021-02-09