Demo: Running Spark Structured Streaming on minikube¶
This demo shows how to run a Spark Structured Streaming application on minikube:
- spark.kubernetes.submission.waitAppCompletion configuration property
spark-submit --status
and--kill
Before you begin¶
It is assumed that you are familiar with the basics of Spark on Kubernetes and the other demos.
Start Cluster¶
Start minikube.
minikube start
Build Spark Application Image¶
Make sure you've got a Spark image available in minikube's Docker registry. Learn the steps in Demo: spark-shell on minikube.
Point the shell to minikube's Docker daemon.
eval $(minikube -p minikube docker-env)
List the Spark image. Make sure it matches the version of Spark you want to work with.
docker images spark
REPOSITORY TAG IMAGE ID CREATED SIZE
spark v3.2.1 e64950545e8f About an hour ago 509MB
Publish the image of the Spark Structured Streaming application. It is project-dependent, and the project uses sbt with sbt-native-packager plugin.
sbt clean docker:publishLocal
List the images and make sure that the image of your Spark application project is available.
docker images spark-streams-demo
REPOSITORY TAG IMAGE ID CREATED SIZE
spark-streams-demo 0.1.0 20145c134ca9 4 minutes ago 515MB
Submit Spark Application to minikube¶
cd $SPARK_HOME
K8S_SERVER=$(k config view --output=jsonpath='{.clusters[].cluster.server}')
Make sure that the Kubernetes resources (e.g. a namespace and a service account) are available in the cluster. Learn more in Demo: Running Spark Application on minikube.
k create -f k8s/rbac.yml
The name of the pod is going to be based on the name of the container image for demo purposes. Pick what works for you.
export POD_NAME=spark-streams-demo
export IMAGE_NAME=$POD_NAME:0.1.0
You may optionally delete all pods (since we use a fixed name for the demo).
k delete po --all
One of the differences between streaming and batch Spark applications is that the Spark Structured Streaming application is supposed to never stop. That's why the demo uses spark.kubernetes.submission.waitAppCompletion configuration property.
./bin/spark-submit \
--master k8s://$K8S_SERVER \
--deploy-mode cluster \
--name $POD_NAME \
--class meetup.SparkStreamsApp \
--conf spark.kubernetes.container.image=$IMAGE_NAME \
--conf spark.kubernetes.driver.pod.name=$POD_NAME \
--conf spark.kubernetes.context=minikube \
--conf spark.kubernetes.namespace=spark-demo \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
--conf spark.kubernetes.submission.waitAppCompletion=false \
--verbose \
local:///opt/spark/jars/meetup.spark-streams-demo-0.1.0.jar
In the end, you should be given a so-called submission ID that you're going to use with spark-submit
tool (via K8SSparkSubmitOperation extension).
INFO LoggingPodStatusWatcherImpl: Deployed Spark application spark-streams-demo with submission ID spark-demo:spark-streams-demo into Kubernetes
Take a note of it as that is how you are going to monitor the application using Spark's spark-submit --status
(and possibly kill it with spark-submit --kill
).
export SUBMISSION_ID=spark-demo:spark-streams-demo
Once submitted, observe pods in another terminal. Make sure you use spark-demo
namespace.
k get po -w -n spark-demo
Request Status of Spark Application¶
Use spark-submit --status SUBMISSION_ID
to requests the status of the Spark driver in cluster
deploy mode.
./bin/spark-submit \
--master k8s://$K8S_SERVER \
--status $SUBMISSION_ID
You should see something similar to the following:
Submitting a request for the status of submission spark-demo:spark-streams-demo in k8s://https://127.0.0.1:55004.
21/01/18 12:16:27 INFO SparkKubernetesClientFactory: Auto-configuring K8S client using current context from users K8S config file
Application status (driver):
pod name: spark-streams-demo
namespace: spark-demo
labels: spark-app-selector -> spark-46ca76cc77c242509f27af3c506eb1f5, spark-role -> driver
pod uid: 034ed206-5804-4e9d-ab68-ec56a7678b65
creation time: 2021-01-18T11:09:46Z
service account name: spark
volumes: spark-local-dir-1, spark-conf-volume, spark-token-888gj
node name: minikube
start time: 2021-01-18T11:09:46Z
phase: Running
container status:
container name: spark-kubernetes-driver
container image: spark-streams-demo:0.1.0
container state: running
container started at: 2021-01-18T11:09:47Z
Kill Spark Application¶
In the end, you can spark-submit --kill
the Spark Structured Streaming application.
./bin/spark-submit \
--master k8s://$K8S_SERVER \
--kill $SUBMISSION_ID
You should see something similar to the following:
Submitting a request to kill submission spark-demo:spark-streams-demo in k8s://https://127.0.0.1:55004. Grace period in secs: not set.
Clean Up¶
Clean up the cluster as described in Demo: spark-shell on minikube.
That's it. Congratulations!