This repository also provides the accessibility to Spark History Server(SHS) for Kubernetes(k8s). The SHS is a web UI that allows you to view information about completed Spark applications. It is a separate service from the Spark application.
The SHS service is deployed by Helm Charts with the ability for public access without manually port-forwarding. For more information, please refer to SHS Chart
- Helm
- A running EKS cluster
The Docker image can be built by running the following command:
docker build -t <repo>:<tag> -f Dockerfile .
Then push to your Docker registry:
docker push <repo>:<tag>
helm repo add stable https://charts.helm.sh/stable
- save your AWS ACCESS KEY to file
aws-access-key
- save your AWS SECRET KEY to file
aws-secret-key
- generate
aws-secrets
for k8s by running the following command:
kubectl create secret generic aws-secrets --from-file=aws-access-key --from-file=aws-secret-key
User should minimally configure the SHS yaml file by modifying the following parameters:
s3.logDirectory
: the S3 bucket path where the Spark application logs are storedimage.repository
&image.tag
: the Docker image repository and tag More configurable parameters can be found in SHS Chart Configurations
- generate your credential key JSON file for your serviceaccount, save to your local machine, name it
key.json
- generate
history-secrets
for k8s by running the following command:
kubectl -n default create secret generic history-secrets --from-file=key.json
User should minimally configure the SHS yaml file by modifying the following parameters:
gcs.logDirectory
: the GCS bucket path where the Spark application logs are storedimage.repository
&image.tag
: the Docker image repository and tag
choose the SHS yaml file for S3
or SHS yaml file for GCS
based on your cloud provider. The following command use the SHS yaml file for S3
as an example.
helm install stable/spark-history-server --namespace default -f shs_s3.yaml --generate-name
Check SHS service status by running the following command:
$ kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 10.100.0.1 <none> 443/TCP 3h17m
spark-history-server-1668577003 LoadBalancer 10.100.26.191 a461246ba4b634bcda15c494946b97f1-688240214.us-west-2.elb.amazonaws.com 18080:31714/TCP 81m
The EXTERNAL-IP is the public IP address of the SHS service.
Open the EXTERNAL-IP:18080
in a browser, you should be able to see the SHS UI.
Instead of delete the pod by kubectl
, user should use helm to un-deploy the SHS service:
helm uninstall spark-history-server-1668577003