Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add spark-operator-v2 #238

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
@@ -1,2 +1,11 @@
lint:
docker run -v $(shell pwd):/mnt -w=/mnt quay.io/helmpack/chart-testing ct lint --config .github/ct-config.yaml

spark-operator-v2:
curl -L -O https://github.com/spotinst/spark-on-k8s-operator/archive/ocean-spark-v2.zip
unzip ocean-spark-v2.zip
rm ocean-spark-v2.zip
rm -rf ./charts/spark-operator-v2/charts/spark-operator
mv ./spark-on-k8s-operator-ocean-spark-v2/charts/spark-operator-chart ./charts/spark-operator-v2/charts/spark-operator
rm -rf spark-on-k8s-operator-ocean-spark-v2
cp -r ./charts/spark-operator-v2/charts/spark-operator/crds ./charts/bigdata-crds
4 changes: 2 additions & 2 deletions charts/spark-operator-static/Chart.yaml
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
apiVersion: v2
description: Spark Operator (static part)
name: spark-operator-static
version: 0.1.6
appVersion: 0.1.6
version: 0.1.7
appVersion: 0.1.7
maintainers:
- name: thorsteinnth
email: [email protected]
Original file line number Diff line number Diff line change
@@ -1,34 +1,34 @@
apiVersion: v1
kind: ConfigMap
metadata:
name: {{ include "spark-operator.fullname" . }}-telemetry-cm
name: spark-operator-static-telemetry-cm
labels:
{{- include "spark-operator.labels" . | nindent 4 }}
{{- include "spark-operator-static.labels" . | nindent 4 }}
data:
custom-filters.conf: |
[FILTER]
Name grep
Match kube.*
Regex $kubernetes['labels']['bigdata.spot.io/component'] {{ index .Values.podLabels "bigdata.spot.io/component" }}
Logical_Op or
Regex $kubernetes['labels']['bigdata.spot.io/component'] spark-operator-controller
Regex $kubernetes['labels']['bigdata.spot.io/component'] spark-operator-webhook

metrics-collection.conf: |
## Configuration for collecting metrics from the spark operator
{{- if .Values.metrics.enable }}
[INPUT]
name prometheus_scrape
host 0.0.0.0
port {{ .Values.metrics.port }}
tag {{ include "spark-operator.fullname" . }}
metrics_path {{ .Values.metrics.endpoint }}
port 8080
tag spark-operator
metrics_path /metrics
scrape_interval 5s

[OUTPUT]
Name prometheus_remote_write
Match {{ include "spark-operator.fullname" . }}
Match spark-operator
Host bigdata-telemetry-thanos-receiver-svc.{{ .Release.Namespace }}.svc.cluster.local
Port 19291
uri /api/v1/receive
tls off
tls.verify off
Workers 1
{{- end }}
6 changes: 6 additions & 0 deletions charts/spark-operator-v2/Chart.lock
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
dependencies:
- name: spark-operator
repository: ""
version: 2.0.2
digest: sha256:ec647402ce487be17941edc3508f3e41e2919833d666df2a956f9a3964944d05
generated: "2024-10-25T16:46:45.656964+02:00"
12 changes: 12 additions & 0 deletions charts/spark-operator-v2/Chart.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
apiVersion: v2
name: spark-operator-v2
description: A Helm chart for Spark on Kubernetes operator.
version: 2.0.2
appVersion: 2.0.2
home: https://github.com/kubeflow/spark-operator
maintainers:
- name: thorsteinnth
email: [email protected]
dependencies:
- name: spark-operator
version: "2.0.2"
39 changes: 39 additions & 0 deletions charts/spark-operator-v2/charts/spark-operator/.helmignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
# Patterns to ignore when building packages.
# This supports shell glob matching, relative path matching, and
# negation (prefixed with !). Only one pattern per line.

ci/
.helmignore

# Common VCS dirs
.git/
.gitignore
.bzr/
.bzrignore
.hg/
.hgignore
.svn/

# Common backup files
*.swp
*.bak
*.tmp
*.orig
*~

# Various IDEs
*.tmproj
.project
.idea/
.vscode/

# MacOS
.DS_Store

# helm-unittest
tests
.debug
__snapshot__

# helm-docs
README.md.gotmpl
39 changes: 39 additions & 0 deletions charts/spark-operator-v2/charts/spark-operator/Chart.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
#
# Copyright 2024 The Kubeflow authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

apiVersion: v2

name: spark-operator

description: A Helm chart for Spark on Kubernetes operator.

version: 2.0.2

appVersion: 2.0.2

keywords:
- apache spark
- big data

home: https://github.com/kubeflow/spark-operator

maintainers:
- name: yuchaoran2011
email: [email protected]
url: https://github.com/yuchaoran2011
- name: ChenYi015
email: [email protected]
url: https://github.com/ChenYi015
177 changes: 177 additions & 0 deletions charts/spark-operator-v2/charts/spark-operator/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,177 @@
# spark-operator

![Version: 2.0.2](https://img.shields.io/badge/Version-2.0.2-informational?style=flat-square) ![AppVersion: 2.0.2](https://img.shields.io/badge/AppVersion-2.0.2-informational?style=flat-square)

A Helm chart for Spark on Kubernetes operator.

**Homepage:** <https://github.com/kubeflow/spark-operator>

## Introduction

This chart bootstraps a [Kubernetes Operator for Apache Spark](https://github.com/kubeflow/spark-operator) deployment using the [Helm](https://helm.sh) package manager.

## Prerequisites

- Helm >= 3
- Kubernetes >= 1.16

## Previous Helm Chart

The previous `spark-operator` Helm chart hosted at [helm/charts](https://github.com/helm/charts) has been moved to this repository in accordance with the [Deprecation timeline](https://github.com/helm/charts#deprecation-timeline). Note that a few things have changed between this version and the old version:

- This repository **only** supports Helm chart installations using Helm 3+ since the `apiVersion` on the chart has been marked as `v2`.
- Previous versions of the Helm chart have not been migrated, and the version has been set to `1.0.0` at the onset. If you are looking for old versions of the chart, it's best to run `helm pull incubator/sparkoperator --version <your-version>` until you are ready to move to this repository's version.
- Several configuration properties have been changed, carefully review the [values](#values) section below to make sure you're aligned with the new values.

## Usage

### Add Helm Repo

```shell
helm repo add spark-operator https://kubeflow.github.io/spark-operator

helm repo update
```

See [helm repo](https://helm.sh/docs/helm/helm_repo) for command documentation.

### Install the chart

```shell
helm install [RELEASE_NAME] spark-operator/spark-operator
```

For example, if you want to create a release with name `spark-operator` in the `spark-operator` namespace:

```shell
helm install spark-operator spark-operator/spark-operator \
--namespace spark-operator \
--create-namespace
```

Note that by passing the `--create-namespace` flag to the `helm install` command, `helm` will create the release namespace if it does not exist.

See [helm install](https://helm.sh/docs/helm/helm_install) for command documentation.

### Upgrade the chart

```shell
helm upgrade [RELEASE_NAME] spark-operator/spark-operator [flags]
```

See [helm upgrade](https://helm.sh/docs/helm/helm_upgrade) for command documentation.

### Uninstall the chart

```shell
helm uninstall [RELEASE_NAME]
```

This removes all the Kubernetes resources associated with the chart and deletes the release, except for the `crds`, those will have to be removed manually.

See [helm uninstall](https://helm.sh/docs/helm/helm_uninstall) for command documentation.

## Values

| Key | Type | Default | Description |
|-----|------|---------|-------------|
| nameOverride | string | `""` | String to partially override release name. |
| fullnameOverride | string | `""` | String to fully override release name. |
| commonLabels | object | `{}` | Common labels to add to the resources. |
| image.registry | string | `"docker.io"` | Image registry. |
| image.repository | string | `"kubeflow/spark-operator"` | Image repository. |
| image.tag | string | If not set, the chart appVersion will be used. | Image tag. |
| image.pullPolicy | string | `"IfNotPresent"` | Image pull policy. |
| image.pullSecrets | list | `[]` | Image pull secrets for private image registry. |
| controller.replicas | int | `1` | Number of replicas of controller. |
| controller.workers | int | `10` | Reconcile concurrency, higher values might increase memory usage. |
| controller.logLevel | string | `"info"` | Configure the verbosity of logging, can be one of `debug`, `info`, `error`. |
| controller.maxTrackedExecutorPerApp | int | `1000` | Specifies the maximum number of Executor pods that can be tracked by the controller per SparkApplication. |
| controller.uiService.enable | bool | `true` | Specifies whether to create service for Spark web UI. |
| controller.uiIngress.enable | bool | `false` | Specifies whether to create ingress for Spark web UI. `controller.uiService.enable` must be `true` to enable ingress. |
| controller.uiIngress.urlFormat | string | `""` | Ingress URL format. Required if `controller.uiIngress.enable` is true. |
| controller.uiIngress.ingressClassName | string | `""` | Optionally set the ingressClassName. |
| controller.batchScheduler.enable | bool | `false` | Specifies whether to enable batch scheduler for spark jobs scheduling. If enabled, users can specify batch scheduler name in spark application. |
| controller.batchScheduler.kubeSchedulerNames | list | `[]` | Specifies a list of kube-scheduler names for scheduling Spark pods. |
| controller.batchScheduler.default | string | `""` | Default batch scheduler to be used if not specified by the user. If specified, this value must be either "volcano" or "yunikorn". Specifying any other value will cause the controller to error on startup. |
| controller.serviceAccount.create | bool | `true` | Specifies whether to create a service account for the controller. |
| controller.serviceAccount.name | string | `""` | Optional name for the controller service account. |
| controller.serviceAccount.annotations | object | `{}` | Extra annotations for the controller service account. |
| controller.rbac.create | bool | `true` | Specifies whether to create RBAC resources for the controller. |
| controller.rbac.annotations | object | `{}` | Extra annotations for the controller RBAC resources. |
| controller.labels | object | `{}` | Extra labels for controller pods. |
| controller.annotations | object | `{}` | Extra annotations for controller pods. |
| controller.volumes | list | `[]` | Volumes for controller pods. |
| controller.nodeSelector | object | `{}` | Node selector for controller pods. |
| controller.affinity | object | `{}` | Affinity for controller pods. |
| controller.tolerations | list | `[]` | List of node taints to tolerate for controller pods. |
| controller.priorityClassName | string | `""` | Priority class for controller pods. |
| controller.podSecurityContext | object | `{"fsGroup":185}` | Security context for controller pods. |
| controller.topologySpreadConstraints | list | `[]` | Topology spread constraints rely on node labels to identify the topology domain(s) that each Node is in. Ref: [Pod Topology Spread Constraints](https://kubernetes.io/docs/concepts/workloads/pods/pod-topology-spread-constraints/). The labelSelector field in topology spread constraint will be set to the selector labels for controller pods if not specified. |
| controller.env | list | `[]` | Environment variables for controller containers. |
| controller.envFrom | list | `[]` | Environment variable sources for controller containers. |
| controller.volumeMounts | list | `[]` | Volume mounts for controller containers. |
| controller.resources | object | `{}` | Pod resource requests and limits for controller containers. Note, that each job submission will spawn a JVM within the controller pods using "/usr/local/openjdk-11/bin/java -Xmx128m". Kubernetes may kill these Java processes at will to enforce resource limits. When that happens, you will see the following error: 'failed to run spark-submit for SparkApplication [...]: signal: killed' - when this happens, you may want to increase memory limits. |
| controller.securityContext | object | `{"allowPrivilegeEscalation":false,"capabilities":{"drop":["ALL"]},"privileged":false,"runAsNonRoot":true}` | Security context for controller containers. |
| controller.sidecars | list | `[]` | Sidecar containers for controller pods. |
| controller.podDisruptionBudget.enable | bool | `false` | Specifies whether to create pod disruption budget for controller. Ref: [Specifying a Disruption Budget for your Application](https://kubernetes.io/docs/tasks/run-application/configure-pdb/) |
| controller.podDisruptionBudget.minAvailable | int | `1` | The number of pods that must be available. Require `controller.replicas` to be greater than 1 |
| controller.pprof.enable | bool | `false` | Specifies whether to enable pprof. |
| controller.pprof.port | int | `6060` | Specifies pprof port. |
| controller.pprof.portName | string | `"pprof"` | Specifies pprof service port name. |
| controller.workqueueRateLimiter.bucketQPS | int | `50` | Specifies the average rate of items process by the workqueue rate limiter. |
| controller.workqueueRateLimiter.bucketSize | int | `500` | Specifies the maximum number of items that can be in the workqueue at any given time. |
| controller.workqueueRateLimiter.maxDelay.enable | bool | `true` | Specifies whether to enable max delay for the workqueue rate limiter. This is useful to avoid losing events when the workqueue is full. |
| controller.workqueueRateLimiter.maxDelay.duration | string | `"6h"` | Specifies the maximum delay duration for the workqueue rate limiter. |
| webhook.enable | bool | `true` | Specifies whether to enable webhook. |
| webhook.replicas | int | `1` | Number of replicas of webhook server. |
| webhook.logLevel | string | `"info"` | Configure the verbosity of logging, can be one of `debug`, `info`, `error`. |
| webhook.port | int | `9443` | Specifies webhook port. |
| webhook.portName | string | `"webhook"` | Specifies webhook service port name. |
| webhook.failurePolicy | string | `"Fail"` | Specifies how unrecognized errors are handled. Available options are `Ignore` or `Fail`. |
| webhook.timeoutSeconds | int | `10` | Specifies the timeout seconds of the webhook, the value must be between 1 and 30. |
| webhook.resourceQuotaEnforcement.enable | bool | `false` | Specifies whether to enable the ResourceQuota enforcement for SparkApplication resources. |
| webhook.serviceAccount.create | bool | `true` | Specifies whether to create a service account for the webhook. |
| webhook.serviceAccount.name | string | `""` | Optional name for the webhook service account. |
| webhook.serviceAccount.annotations | object | `{}` | Extra annotations for the webhook service account. |
| webhook.rbac.create | bool | `true` | Specifies whether to create RBAC resources for the webhook. |
| webhook.rbac.annotations | object | `{}` | Extra annotations for the webhook RBAC resources. |
| webhook.labels | object | `{}` | Extra labels for webhook pods. |
| webhook.annotations | object | `{}` | Extra annotations for webhook pods. |
| webhook.sidecars | list | `[]` | Sidecar containers for webhook pods. |
| webhook.volumes | list | `[]` | Volumes for webhook pods. |
| webhook.nodeSelector | object | `{}` | Node selector for webhook pods. |
| webhook.affinity | object | `{}` | Affinity for webhook pods. |
| webhook.tolerations | list | `[]` | List of node taints to tolerate for webhook pods. |
| webhook.priorityClassName | string | `""` | Priority class for webhook pods. |
| webhook.podSecurityContext | object | `{"fsGroup":185}` | Security context for webhook pods. |
| webhook.topologySpreadConstraints | list | `[]` | Topology spread constraints rely on node labels to identify the topology domain(s) that each Node is in. Ref: [Pod Topology Spread Constraints](https://kubernetes.io/docs/concepts/workloads/pods/pod-topology-spread-constraints/). The labelSelector field in topology spread constraint will be set to the selector labels for webhook pods if not specified. |
| webhook.env | list | `[]` | Environment variables for webhook containers. |
| webhook.envFrom | list | `[]` | Environment variable sources for webhook containers. |
| webhook.volumeMounts | list | `[]` | Volume mounts for webhook containers. |
| webhook.resources | object | `{}` | Pod resource requests and limits for webhook pods. |
| webhook.securityContext | object | `{"allowPrivilegeEscalation":false,"capabilities":{"drop":["ALL"]},"privileged":false,"runAsNonRoot":true}` | Security context for webhook containers. |
| webhook.podDisruptionBudget.enable | bool | `false` | Specifies whether to create pod disruption budget for webhook. Ref: [Specifying a Disruption Budget for your Application](https://kubernetes.io/docs/tasks/run-application/configure-pdb/) |
| webhook.podDisruptionBudget.minAvailable | int | `1` | The number of pods that must be available. Require `webhook.replicas` to be greater than 1 |
| spark.jobNamespaces | list | `["default"]` | List of namespaces where to run spark jobs. If empty string is included, all namespaces will be allowed. Make sure the namespaces have already existed. |
| spark.serviceAccount.create | bool | `true` | Specifies whether to create a service account for spark applications. |
| spark.serviceAccount.name | string | `""` | Optional name for the spark service account. |
| spark.serviceAccount.annotations | object | `{}` | Optional annotations for the spark service account. |
| spark.rbac.create | bool | `true` | Specifies whether to create RBAC resources for spark applications. |
| spark.rbac.annotations | object | `{}` | Optional annotations for the spark application RBAC resources. |
| prometheus.metrics.enable | bool | `true` | Specifies whether to enable prometheus metrics scraping. |
| prometheus.metrics.port | int | `8080` | Metrics port. |
| prometheus.metrics.portName | string | `"metrics"` | Metrics port name. |
| prometheus.metrics.endpoint | string | `"/metrics"` | Metrics serving endpoint. |
| prometheus.metrics.prefix | string | `""` | Metrics prefix, will be added to all exported metrics. |
| prometheus.podMonitor.create | bool | `false` | Specifies whether to create pod monitor. Note that prometheus metrics should be enabled as well. |
| prometheus.podMonitor.labels | object | `{}` | Pod monitor labels |
| prometheus.podMonitor.jobLabel | string | `"spark-operator-podmonitor"` | The label to use to retrieve the job name from |
| prometheus.podMonitor.podMetricsEndpoint | object | `{"interval":"5s","scheme":"http"}` | Prometheus metrics endpoint properties. `metrics.portName` will be used as a port |

## Maintainers

| Name | Email | Url |
| ---- | ------ | --- |
| yuchaoran2011 | <[email protected]> | <https://github.com/yuchaoran2011> |
| ChenYi015 | <[email protected]> | <https://github.com/ChenYi015> |
Loading
Loading