Skip to content

Demo: Running Spark Examples on Google Kubernetes Engine

This demo shows how to run the official Spark Examples on a Kubernetes cluster on Google Kubernetes Engine (GKE).

This demo focuses on the ubiquitous SparkPi example, but should let you run the other sample Spark applications too.

./bin/run-example SparkPi 10

Before you begin

Open up a Google Cloud project in the Google Cloud Console and enable the Kubernetes Engine API (as described in Deploying a containerized web application).

Review Demo: Running Spark Examples on minikube to build a basic understanding of the process of deploying Spark applications to a local Kubernetes cluster using minikube.

Prepare Spark Base Image

export PROJECT_ID=$(gcloud info --format='value(config.project)')

Build Spark Image

Build and push a Apache Spark base image to Container Registry on Google Cloud Platform.

./bin/ \
  -r $GCP_CR \
  -t v3.2.1 \

List the images using docker images command (and some other fancy options).

docker images "$GCP_CR/*" --format "table {{.Repository}}\t{{.Tag}}"
REPOSITORY                                 TAG   v3.2.1

Push Spark Image to Container Registry

Push the container image to the Container Registry so that a GKE cluster can run it in a pod (as described in Pushing the Docker image to Container Registry).

gcloud auth configure-docker
./bin/ \
  -r $GCP_CR \
  -t v3.2.1 \

List Images

Use gcloud container images list to list the Spark image in the repository.

gcloud container images list --repository $GCP_CR

List Tags

Use gcloud container images list-tags to list tags and digests for the specified image.

gcloud container images list-tags $GCP_CR/spark
9a50d1435bbe  v3.2.1  2021-01-26T13:02:11

Describe Spark Image

Use gcloud container images describe to list information about the Spark image.

gcloud container images describe $GCP_CR/spark:v3.2.1
  digest: sha256:9a50d1435bbe81dd3a23d3e43c244a0bfc37e14fb3754b68431cbf8510360b84
  repository: spark-on-kubernetes-2021/spark

Create Kubernetes Cluster

export CLUSTER_NAME=spark-examples-cluster

The default version of Kubernetes varies per Google Cloud zone and is often older than the latest stable release. A cluster version can be changed using --cluster-version option.

Use gcloud container get-server-config command to check which Kubernetes versions are available and default in your zone.

gcloud container get-server-config


Use latest version alias to use the highest supported Kubernetes version currently available on GKE in the cluster's zone or region.

gcloud container clusters create $CLUSTER_NAME \

Wait a few minutes before the GKE cluster is ready. In the end, you should see a summary of the cluster.

List Clusters

gcloud container clusters list
spark-examples-cluster  europe-west3-b  1.20.7-gke.1800  e2-medium     1.20.7-gke.1800  3          RUNNING

Config View

Review the configuration of the GKE cluster.

k config view

Compute Instances

Review the cluster's VM instances.

gcloud compute instances list

Run SparkPi on GKE


What follows is a more succinct version of Demo: Running Spark Application on minikube.

Create Kubernetes Resources

Use the following yaml configuration file (rbac.yml) to create required resources.

apiVersion: v1
kind: Namespace
  name: spark-demo
apiVersion: v1
kind: ServiceAccount
  name: spark
  namespace: spark-demo
kind: ClusterRoleBinding
  name: spark-role
  namespace: spark-demo
  - kind: ServiceAccount
    name: spark
    namespace: spark-demo
  kind: ClusterRole
  name: edit

Use k create to create the Kubernetes resources.

k create -f k8s/rbac.yml

Submit SparkPi

export K8S_SERVER=$(kubectl config view --output=jsonpath='{.clusters[].cluster.server}')
export POD_NAME=spark-examples-pod
export SPARK_IMAGE=$GCP_CR/spark:v3.2.1

Before the real spark-submit happens, open another terminal and watch the pods being created and terminated while the Spark application is going up and down. Don't forget to use spark-demo namespace.

k get po -n spark-demo -w


For the time being we're going to use spark-submit not run-example. See Demo: Running Spark Examples on minikube for more information.

./bin/spark-submit \
  --master k8s://$K8S_SERVER \
  --deploy-mode cluster \
  --name $POD_NAME \
  --class org.apache.spark.examples.SparkPi \
  --conf spark.kubernetes.driver.request.cores=400m \
  --conf spark.kubernetes.executor.request.cores=100m \
  --conf spark.kubernetes.container.image=$SPARK_IMAGE \
  --conf$POD_NAME \
  --conf spark.kubernetes.namespace=spark-demo \
  --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
  --verbose \
  local:///opt/spark/examples/jars/spark-examples_2.12-3.2.1.jar 10


spark.kubernetes.*.request.cores configuration properties were required due to the default machine type of a GKE cluster is too small CPU-wise. You may consider another machine type for a GKE cluster (e.g. c2-standard-4).

In the end, review the logs.

k logs -n spark-demo $POD_NAME

KubernetesClientException: pods "spark-examples-pod" already exists

While executing the demo you may run into the following exception:

Exception in thread "main" io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: POST at: Message: pods "spark-examples-pod" already exists. Received status: Status(apiVersion=v1, code=409, details=StatusDetails(causes=[], group=null, kind=pods, name=spark-examples-pod, retryAfterSeconds=null, uid=null, additionalProperties={}), kind=Status, message=pods "spark-examples-pod" already exists, metadata=ListMeta(_continue=null, remainingItemCount=null, resourceVersion=null, selfLink=null, additionalProperties={}), reason=AlreadyExists, status=Failure, additionalProperties={}).
    at io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(
    at io.fabric8.kubernetes.client.dsl.base.OperationSupport.assertResponseCode(
    at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(
    at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(
    at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleCreate(
    at io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleCreate(
    at io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(
    at io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(
    at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$3(KubernetesClientApplication.scala:213)
    at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$3$adapted(KubernetesClientApplication.scala:207)
    at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2622)
    at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClientApplication.scala:179)
    at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
    at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
    at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
    at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1039)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1048)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

As the error message says:

Message: pods "spark-examples-pod" already exists.

You are supposed to delete the pod before any other future demo attempts. The driver pods are left over (after a Spark application is finished) for log or configuration review.

k delete po -n spark-demo $POD_NAME

Clean Up

Delete the GKE cluster.

gcloud container clusters delete $CLUSTER_NAME --quiet

Delete the images.

gcloud container images delete $SPARK_IMAGE --force-delete-tags --quiet
Back to top