Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: 📦 Provide RudderStack Chart with HPA #44

Open
wants to merge 18 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 26 additions & 0 deletions .github/workflows/ci-release.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
name: CI Release Charts

on:
push:
branches:
- main

jobs:
release:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v2
with:
fetch-depth: 0

- name: Configure Git
run: |
git config user.name "$GITHUB_ACTOR"
git config user.email "[email protected]"

- name: Run chart-releaser
uses: helm/[email protected]
env:
CR_TOKEN: "${{ secrets.GITHUB_TOKEN }}"
CR_SKIP_EXISTING: true
10 changes: 9 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1 +1,9 @@
*.DS_Store
**.DS_Store
_config.yml

# Helm
**/Chart.lock
*/*/charts/

# Idea
.idea
1 change: 1 addition & 0 deletions CODEOWNERS
Validating CODEOWNERS rules …
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
* @Typeform/data-engineering
123 changes: 107 additions & 16 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,27 +1,30 @@
# What is RudderStack?

[RudderStack](https://rudderstack.com/) is a **customer data pipeline** tool for collecting, routing and processing data from your websites, apps, cloud tools, and data warehouse.
[RudderStack](https://rudderstack.com/) is a **customer data pipeline** tool for collecting, routing and processing data
from your websites, apps, cloud tools, and data warehouse.

More information on RudderStack can be found [here](https://github.com/rudderlabs/rudder-server).

## TL;DR;

```bash
$ git clone [email protected]:rudderlabs/rudderstack-helm.git
$ cd rudderstack-helm/
$ cd rudderstack-helm/charts/rudderstack
$ helm dependency build
$ helm install my-release ./ --set rudderWorkspaceToken="<workspace token from the dashboard>"
```

## Introduction

The RudderStack Helm chart creates a Rudderstack deployment on a [Kubernetes](http://kubernetes.io) cluster
using the [Helm](https://helm.sh) package manager.
The RudderStack Helm chart creates a Rudderstack deployment on a [Kubernetes](http://kubernetes.io) cluster using
the [Helm](https://helm.sh) package manager.

## Prerequisites

- Kubectl installed and connected to your kubernetes cluster
- Helm installed
- Workspace token from the [RudderStack dashboard](https://app.rudderstack.com). Set up your account and copy your workspace token from the top of the home page.
- Workspace token from the [RudderStack dashboard](https://app.rudderstack.com). Set up your account and copy your
workspace token from the top of the home page.

## Installing the Chart

Expand All @@ -31,7 +34,9 @@ To install the chart with the release name `my-release`, from the root directory
$ helm install my-release ./ --set rudderWorkspaceToken="<workspace token from the dashboard>"
```

The command deploys Rudderstack on the default Kubernetes cluster configured with `kubectl`. The [configuration](#configuration) section lists the most significant parameters that can be configured during deployment.
The command deploys Rudderstack on the default Kubernetes cluster configured with `kubectl`.
The [configuration](#configuration) section lists the most significant parameters that can be configured during
deployment.

## Upgrading the Chart

Expand All @@ -51,21 +56,104 @@ $ helm uninstall my-release

This removes all the components created by this chart.

## Developing the Chart

To run a dry-run to evaluate if the changes proposed would be applied properly we can execute:

```bash
helm template ./ | kubectl apply --dry-run=client -f -
```

## Postgres dependency

We contemplate three options on having Postgres as a dependency.

- Deploying it as a **Sidecar** in the same stateful resource
- Deploying a new Statefulset with Postgres.
- Providing an external Postgres.

### Sidecar mode

To enable the sidecar mode, specify:

```yaml
postgresql:
mode: sidecar
statefulset_enabled: false
```

### Stateful mode

To enable the sidecar mode, specify:

```yaml
postgresql:
mode: statefulset
statefulset_enabled: true
```

## HPA : Horizontal Pod Autoscaling

> Only recommended with **postgresql sidecar mode enable**.

> Currently, only supported for `backend.controlPlaneJSON:true` since the **[pre-stop hook](charts/rudderstack/pre-stop.sh)**
> reads from the local config guaranteeing that all the events reached the destination so no event is lost on
> the autoscaling down process.

Horizontal Pod Autoscaling is available in case of resource efficiency requirement.

```yaml
backend:
terminationGracePeriodSeconds: xx
lifecycleSleepTime: xx
hpa:
enabled: true
```

Also, make sure you define the `lifecycleSleepTime` & the `terminationGracePeriodSeconds` bigger
than `BatchRouter.uploadFreqInS` otherwise K8s will kill the pods before flushing the data into their destinations.

## Open-source Control Plane

If you are using open-source config-generator UI, you need to set the parameter `controlPlaneJSON` to `true` in the `values.yaml` file. Export workspace-config from the config-generator and copy/paste the contents into the `workspaceConfig.json` file.
If you are using open-source config-generator UI, you need to set the parameter `controlPlaneJSON` to `true` in
the `values.yaml` file. Export workspace-config from the config-generator and copy/paste the contents into
the `workspaceConfig.json` file.

```bash
$ helm install my-release ./ --set backend.controlPlaneJSON=true
```

## Extending the Chart

Since we are publishing the Chart under the {{ TBC by the RudderStack team }} page. It's possible to extend this Chart
by adding it as a dependency into your own Chart, so there is no need to git clone this repo for deploying RudderStack
open-source into your infrastructure.

```yaml
apiVersion: v2
name: rudderstack
description: Customer Data Pipeline tool for collecting, routing and processing data.
maintainers:
- name: Data Platform
email: [email protected]
version: 0.4.5
appVersion: 1.16.0
dependencies:
# https://github.com/rudderlabs/rudderstack-helm
- name: rudderstack
version: 0.4.5
repository: https://TBC.github.io/rudderstack-helm # To Be Confirmed by the RudderStack team
```

## GCP

If you are using Google Cloud Storage or Google BigQuery for the following cases, you have to replace the contents of the file [rudder-google-application-credentials.json](rudder-google-application-credentials.json) with your service account:
If you are using Google Cloud Storage or Google BigQuery for the following cases, you have to replace the contents of
the file [rudder-google-application-credentials.json](charts/rudderstack/rudder-google-application-credentials.json)
with your service account:

- GCS as a destination
- GCS for dumping jobs
- BigQuery as a warehouse destination.
- GCS as a destination
- GCS for dumping jobs
- BigQuery as a warehouse destination.

## Configuration

Expand All @@ -83,7 +171,8 @@ The following table lists the configurable parameters of the Rudderstack chart a
| `backend.extraEnvVars` | Extra environments variables to be used by the backend in the deployments | `Refer values.yaml file` |
| `backend.controlPlaneJSON` | If `true`, backend will read config from the workspaceConfig.json file | `false` |

Each of these parameters can be changed in `values.yaml`. Or specify each parameter using the `--set key=value[,key=value]` argument to `helm install`. For example:
Each of these parameters can be changed in `values.yaml`. Or specify each parameter using
the `--set key=value[,key=value]` argument to `helm install`. For example:

```bash
$ helm install --name my-release \
Expand All @@ -92,6 +181,7 @@ $ helm install --name my-release \
```

**Note:** Configuration specific to:

- Backend can be edited in [rudder-config.yaml](https://docs.rudderlabs.com/administrators-guide/config-parameters).
- PostgreSQL can be edited in `pg_hba.conf`, `postgresql.conf`

Expand All @@ -100,15 +190,16 @@ $ helm install --name my-release \
Installing this Helm chart will deploy the following pods and containers in the configured cluster:

#### POD - {Release name}-rudderstack-0 :

- rudderstack-backend
- rudderstack-telegraf-sidecar

#### POD - {Release name}-rudderstack-postgresql-0 :
- {Release name}-rudderstack-postgresql
- rudderstack-postgresql-sidecar

#### POD - {Release name}-rudderstack-transformer-xxxxxxxxxx-xxxxx:

- transformer

## Contact Us

For any queries related to using the RudderStack Helm Chart, feel free to start a conversation on our [Slack](https://resources.rudderstack.com/join-rudderstack-slack) channel.
For any queries related to using the RudderStack Helm Chart, feel free to start a conversation on
our [Slack](https://resources.rudderstack.com/join-rudderstack-slack) channel.
24 changes: 11 additions & 13 deletions Chart.yaml → charts/rudderstack/Chart.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -14,21 +14,19 @@ type: application

# This is the chart version. This version number should be incremented each time you make changes
# to the chart and its templates, including the app version.
version: 0.3.0
version: 0.4.9

# This is the version number of the application being deployed. This version number should be
# incremented each time you make changes to the application.
appVersion: 1.16.0

# WIP
#dependencies:
# - name: nginx-ingress
# version: ~1.6.0
# repository: https://helm.nginx.com/stable
# condition: (optional) A yaml path that resolves to a boolean, used for enabling/disabling charts (e.g. subchart1.enabled )
# tags: # (optional)
# - Tags can be used to group charts for enabling/disabling together
# enabled: (optional) Enabled bool determines if chart should be loaded
# import-values: # (optional)
# - ImportValues holds the mapping of source values to parent key to be imported. Each item can be a string or pair of child/parent sublist items.
# alias: (optional) Alias usable alias to be used for the chart. Useful when you have to add the same chart multiple times
dependencies:
- name: transformer
version: 0.1.0
repository: file://../subcharts/transformer
condition: transformer.enabled

- name: postgresql
version: 7.7.2
repository: file://../subcharts/postgresql
condition: postgresql.statefulset_enabled
File renamed without changes.
File renamed without changes.
50 changes: 50 additions & 0 deletions charts/rudderstack/pre-stop.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
#!/bin/sh
START=$(date +%s.%N)

# Install JQ
JQ=/usr/bin/jq
curl -sLo $JQ https://stedolan.github.io/jq/download/linux64/jq
chmod +x $JQ

# Extract source IDs
SOURCE_IDS=$(more $RSERVER_BACKEND_CONFIG_CONFIG_JSONPATH | jq -r .sources[].id)
STATSD_PREFIX="rudder_server_pre_stop"

# Emit Start datum to Telegraf
echo "${STATSD_PREFIX}_start:1|c" | nc -w 1 -u localhost 8125

# Check pending events per Source ID
HAS_PENDING_EVENTS=true
while $HAS_PENDING_EVENTS ; do
TOTAL_PENDING_EVENTS=0
echo "1. Checking pending events...."
echo "${STATSD_PREFIX}_check:1|c" | nc -w 1 -u localhost 8125

for source_id in $SOURCE_IDS; do
PENDING_EVENTS=$(curl -s --request POST --url http://localhost:8080/v1/pending-events \
-u ${source_id}:potato --header 'Content-Type: application/json' \
--data "{\"source_id\": \"${source_id}\"}" | jq .pending_events)
echo "2. Pending $PENDING_EVENTS events for Source [$source_id]"
TOTAL_PENDING_EVENTS=$(( $TOTAL_PENDING_EVENTS + $PENDING_EVENTS ))
done

echo "3. Checking result..."
echo "${STATSD_PREFIX}_pending_events:$TOTAL_PENDING_EVENTS|g" | nc -w 1 -u localhost 8125
if [[ ${TOTAL_PENDING_EVENTS} -eq 0 ]]; then
echo "4. No pending events"
HAS_PENDING_EVENTS=false
else
echo "4. Total pending events: $TOTAL_PENDING_EVENTS"
fi
echo "5. Sleeping for 5 seconds..."
sleep 5s

done

echo "${STATSD_PREFIX}_done:1|c" | nc -w 1 -u localhost 8125
END=$(date +%s.%N)
PRE_STOP_DURATION_SEC=$(echo "$END - $START" | bc)
echo "${STATSD_PREFIX}_time:${PRE_STOP_DURATION_SEC}|ms" | nc -w 1 -u localhost 8125

# Giving time to Telegraf to forward pre-stop metrics
sleep 60s
Original file line number Diff line number Diff line change
Expand Up @@ -115,7 +115,7 @@ Router:
BatchRouter:
mainLoopSleep: 2s
jobQueryBatchSize: 100000
uploadFreq: 30s
uploadFreqInS: 30
warehouseServiceMaxRetryTime: 3h
noOfWorkers: 8
maxFailedCountForJob: 128
Expand Down
File renamed without changes.
Loading