Demo: Running PySpark Application on minikube¶
This demo shows how to deploy a PySpark application to Kubernetes (using minikube).
Before you begin¶
It is assumed that you have finished the following:
Start Cluster¶
Unless already started, start minikube.
minikube start
Build PySpark Image¶
In a separate terminal...
cd $SPARK_HOME
Tip
Review kubernetes/dockerfiles/spark
(in your Spark installation) or resource-managers/kubernetes/docker/src/main/dockerfiles/spark
(in the Spark source code).
docker-image-tool¶
Build and publish the PySpark image. Note -m
option to point the shell script to use minikube's Docker daemon.
./bin/docker-image-tool.sh \
-m \
-t v3.2.1 \
-p resource-managers/kubernetes/docker/src/main/dockerfiles/spark/bindings/python/Dockerfile \
build
docker images¶
Point the shell to minikube's Docker daemon.
eval $(minikube -p minikube docker-env)
List the Spark image.
docker images spark-py
REPOSITORY TAG IMAGE ID CREATED SIZE
spark-py v3.2.1 f15c947dea88 32 seconds ago 1.17GB
Build Spark Application Image¶
Point the shell to minikube's Docker daemon and make sure there is the Spark image (that your Spark application project uses).
eval $(minikube -p minikube docker-env)
Use this image in the Dockerfile
of your Spark application:
FROM spark-py:v3.2.1
Build and push the Docker image of your Spark application project to minikube's Docker repository.
List the images and make sure that the image of your Spark application project is available.
docker images pf-demo
REPOSITORY TAG IMAGE ID CREATED SIZE
pf-demo v1.0.0 7ff4769b11c1 21 seconds ago 1.17GB
Create Kubernetes Resources¶
Create required Kubernetes resources to run a Spark application.
Spark official documentation
Learn more from the Spark official documentation.
Make sure to create the required Kubernetes resources (a service account and a cluster role binding) as without them you surely run into the following exception message:
Forbidden!Configured service account doesn't have access. Service account may have been revoked.
Declaratively¶
Use the following rbac.yml
file (from the Spark on Kubernetes Demos project).
apiVersion: v1
kind: Namespace
metadata:
name: spark-demo
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: spark
namespace: spark-demo
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: spark-role
namespace: spark-demo
subjects:
- kind: ServiceAccount
name: spark
namespace: spark-demo
roleRef:
kind: ClusterRole
name: edit
apiGroup: rbac.authorization.k8s.io
---
Create the resources in the Kubernetes cluster.
k create -f rbac.yml
Tip
With declarative approach (using rbac.yml
) cleaning up becomes as simple as k delete -f rbac.yml
.
Submit Spark Application to minikube¶
cd $SPARK_HOME
K8S_SERVER=$(k config view --output=jsonpath='{.clusters[].cluster.server}')
export POD_NAME=pf-demo
export IMAGE_NAME=$POD_NAME:v1.0.0
Please note the configuration properties (some not really necessary but make the demo easier to guide you through, e.g. spark.kubernetes.driver.pod.name).
./bin/spark-submit \
--master k8s://$K8S_SERVER \
--deploy-mode cluster \
--name pf-demo \
--conf spark.kubernetes.container.image=$IMAGE_NAME \
--conf spark.kubernetes.context=minikube \
--conf spark.kubernetes.driver.pod.name=pf-demo \
--conf spark.kubernetes.namespace=spark-demo \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
--conf spark.kubernetes.file.upload.path=/tmp \
--conf spark.ui.enabled=false \
--driver-java-options="-Dlog4j.configuration=file:conf/log4j.properties" \
--conf spark.sql.extensions=pl.japila.spark.sql.MySparkSessionExtension \
local:///pf-search/demo.py
Accessing web UI¶
k port-forward $POD_NAME 4040:4040
Open http://localhost:4040.
Accessing Logs¶
Access the logs of the driver.
k logs -f pf-demo
Spark Application Management¶
K8S_SERVER=$(kubectl config view --output=jsonpath='{.clusters[].cluster.server}')
Application Status¶
./bin/spark-submit \
--master k8s://$K8S_SERVER \
--status "spark-demo:$POD_NAME"
Application status (driver):
pod name: meetup-spark-app
namespace: spark-demo
labels: spark-app-selector -> spark-0df2be7b2d8d40299e7a406564c9833c, spark-role -> driver
pod uid: 30a749a3-1060-49a7-b502-4a054ea33d30
creation time: 2021-02-09T13:18:14Z
service account name: spark
volumes: spark-local-dir-1, spark-conf-volume-driver, spark-token-hqc6k
node name: minikube
start time: 2021-02-09T13:18:14Z
phase: Running
container status:
container name: spark-kubernetes-driver
container image: meetup-spark-app:0.1.0
container state: running
container started at: 2021-02-09T13:18:15Z
Stop Spark Application¶
./bin/spark-submit \
--master k8s://$K8S_SERVER \
--kill "spark-demo:$POD_NAME"
Clean Up¶
Clean up the cluster as described in Demo: spark-shell on minikube.
That's it. Congratulations!