rudderlabs · afranzi · Jan 31, 2022 · Feb 1, 2022 · Feb 1, 2022 · Feb 1, 2022
diff --git a/.github/workflows/ci-release.yml b/.github/workflows/ci-release.yml
@@ -0,0 +1,26 @@
+name: CI Release Charts
+
+on:
+  push:
+    branches:
+      - main
+
+jobs:
+  release:
+    runs-on: ubuntu-latest
+    steps:
+      - name: Checkout
+        uses: actions/checkout@v2
+        with:
+          fetch-depth: 0
+
+      - name: Configure Git
+        run: |
+          git config user.name "$GITHUB_ACTOR"
+          git config user.email "[email protected]"
+
+      - name: Run chart-releaser
+        uses: helm/[email protected]
+        env:
+          CR_TOKEN: "${{ secrets.GITHUB_TOKEN }}"
+          CR_SKIP_EXISTING: true
diff --git a/.gitignore b/.gitignore
@@ -1 +1,9 @@
-*.DS_Store
+**.DS_Store
+_config.yml
+
+# Helm
+**/Chart.lock
+*/*/charts/
+
+# Idea
+.idea
diff --git a/CODEOWNERS b/CODEOWNERS
@@ -0,0 +1 @@
+* @Typeform/data-engineering
diff --git a/README.md b/README.md
@@ -1,27 +1,30 @@
 # What is RudderStack?
 
-[RudderStack](https://rudderstack.com/) is a **customer data pipeline** tool for collecting, routing and processing data from your websites, apps, cloud tools, and data warehouse.
+[RudderStack](https://rudderstack.com/) is a **customer data pipeline** tool for collecting, routing and processing data
+from your websites, apps, cloud tools, and data warehouse.
 
 More information on RudderStack can be found [here](https://github.com/rudderlabs/rudder-server).
 
 ## TL;DR;
 
 ```bash
 $ git clone [email protected]:rudderlabs/rudderstack-helm.git
-$ cd rudderstack-helm/
+$ cd rudderstack-helm/charts/rudderstack
+$ helm dependency build 
 $ helm install my-release ./ --set rudderWorkspaceToken="<workspace token from the dashboard>"
 ```
 
 ## Introduction
 
-The RudderStack Helm chart creates a Rudderstack deployment on a [Kubernetes](http://kubernetes.io) cluster
-using the [Helm](https://helm.sh) package manager.
+The RudderStack Helm chart creates a Rudderstack deployment on a [Kubernetes](http://kubernetes.io) cluster using
+the [Helm](https://helm.sh) package manager.
 
 ## Prerequisites
 
 - Kubectl installed and connected to your kubernetes cluster
 - Helm installed
-- Workspace token from the [RudderStack dashboard](https://app.rudderstack.com). Set up your account and copy your workspace token from the top of the home page.
+- Workspace token from the [RudderStack dashboard](https://app.rudderstack.com). Set up your account and copy your
+  workspace token from the top of the home page.
 
 ## Installing the Chart
 
@@ -31,7 +34,9 @@ To install the chart with the release name `my-release`, from the root directory
 $ helm install my-release ./ --set rudderWorkspaceToken="<workspace token from the dashboard>"
 ```
 
-The command deploys Rudderstack on the default Kubernetes cluster configured with `kubectl`. The [configuration](#configuration) section lists the most significant parameters that can be configured during deployment.
+The command deploys Rudderstack on the default Kubernetes cluster configured with `kubectl`.
+The [configuration](#configuration) section lists the most significant parameters that can be configured during
+deployment.
 
 ## Upgrading the Chart
 
@@ -51,21 +56,104 @@ $ helm uninstall my-release
 
 This removes all the components created by this chart.
 
+## Developing the Chart
+
+To run a dry-run to evaluate if the changes proposed would be applied properly we can execute:
+
+```bash
+helm template ./ | kubectl apply --dry-run=client -f -
+```
+
+## Postgres dependency
+
+We contemplate three options on having Postgres as a dependency.
+
+- Deploying it as a **Sidecar** in the same stateful resource
+- Deploying a new Statefulset with Postgres.
+- Providing an external Postgres.
+
+### Sidecar mode
+
+To enable the sidecar mode, specify:
+
+```yaml
+postgresql:
+  mode: sidecar
+  statefulset_enabled: false
+```
+
+### Stateful mode
+
+To enable the sidecar mode, specify:
+
+```yaml
+postgresql:
+  mode: statefulset
+  statefulset_enabled: true
+```
+
+## HPA : Horizontal Pod Autoscaling
+
+> Only recommended with **postgresql sidecar mode enable**.
+
+> Currently, only supported for `backend.controlPlaneJSON:true` since the **[pre-stop hook](charts/rudderstack/pre-stop.sh)**
+> reads from the local config guaranteeing that all the events reached the destination so no event is lost on
+> the autoscaling down process.
+
+Horizontal Pod Autoscaling is available in case of resource efficiency requirement.
+
+```yaml
+backend:
+  terminationGracePeriodSeconds: xx
+  lifecycleSleepTime: xx
+  hpa:
+    enabled: true
+```
+
+Also, make sure you define the `lifecycleSleepTime` & the `terminationGracePeriodSeconds` bigger
+than `BatchRouter.uploadFreqInS` otherwise K8s will kill the pods before flushing the data into their destinations.
+
 ## Open-source Control Plane
 
-If you are using open-source config-generator UI, you need to set the parameter `controlPlaneJSON` to `true` in the `values.yaml` file. Export workspace-config from the config-generator and copy/paste the contents into the `workspaceConfig.json` file.
+If you are using open-source config-generator UI, you need to set the parameter `controlPlaneJSON` to `true` in
+the `values.yaml` file. Export workspace-config from the config-generator and copy/paste the contents into
+the `workspaceConfig.json` file.
 
 ```bash
 $ helm install my-release ./ --set backend.controlPlaneJSON=true
  ```
 
+## Extending the Chart
+
+Since we are publishing the Chart under the {{ TBC by the RudderStack team }} page. It's possible to extend this Chart
+by adding it as a dependency into your own Chart, so there is no need to git clone this repo for deploying RudderStack
+open-source into your infrastructure.
+
+```yaml
+apiVersion: v2
+name: rudderstack
+description: Customer Data Pipeline tool for collecting, routing and processing data.
+maintainers:
+  - name: Data Platform
+    email: [email protected]
+version: 0.4.5
+appVersion: 1.16.0
+dependencies:
+  # https://github.com/rudderlabs/rudderstack-helm
+  - name: rudderstack
+    version: 0.4.5
+    repository: https://TBC.github.io/rudderstack-helm # To Be Confirmed by the RudderStack team
+```
+
 ## GCP
 
-If you are using Google Cloud Storage or Google BigQuery for the following cases, you have to replace the contents of the file [rudder-google-application-credentials.json](rudder-google-application-credentials.json) with your service account:
+If you are using Google Cloud Storage or Google BigQuery for the following cases, you have to replace the contents of
+the file [rudder-google-application-credentials.json](charts/rudderstack/rudder-google-application-credentials.json)
+with your service account:
 
- - GCS as a destination
- - GCS for dumping jobs
- - BigQuery as a warehouse destination.
+- GCS as a destination
+- GCS for dumping jobs
+- BigQuery as a warehouse destination.
 
 ## Configuration
 
@@ -83,7 +171,8 @@ The following table lists the configurable parameters of the Rudderstack chart a
 | `backend.extraEnvVars`              | Extra environments variables to be used by the backend in the deployments                           | `Refer values.yaml file` |
 | `backend.controlPlaneJSON`                   | If `true`, backend will read config from the workspaceConfig.json file  |  `false` |
 
-Each of these parameters can be changed in `values.yaml`. Or specify each parameter using the `--set key=value[,key=value]` argument to `helm install`. For example:
+Each of these parameters can be changed in `values.yaml`. Or specify each parameter using
+the `--set key=value[,key=value]` argument to `helm install`. For example:
 
 ```bash
 $ helm install --name my-release \
@@ -92,6 +181,7 @@ $ helm install --name my-release \
 ```
 
 **Note:** Configuration specific to:
+
 - Backend can be edited in [rudder-config.yaml](https://docs.rudderlabs.com/administrators-guide/config-parameters).
 - PostgreSQL can be edited in `pg_hba.conf`, `postgresql.conf`
 
@@ -100,15 +190,16 @@ $ helm install --name my-release \
 Installing this Helm chart will deploy the following pods and containers in the configured cluster:
 
 #### POD - {Release name}-rudderstack-0 :
+
 - rudderstack-backend
 - rudderstack-telegraf-sidecar
-
-#### POD - {Release name}-rudderstack-postgresql-0 :
-- {Release name}-rudderstack-postgresql
+- rudderstack-postgresql-sidecar
 
 #### POD - {Release name}-rudderstack-transformer-xxxxxxxxxx-xxxxx:
+
 - transformer
 
 ## Contact Us
 
-For any queries related to using the RudderStack Helm Chart, feel free to start a conversation on our [Slack](https://resources.rudderstack.com/join-rudderstack-slack) channel.
+For any queries related to using the RudderStack Helm Chart, feel free to start a conversation on
+our [Slack](https://resources.rudderstack.com/join-rudderstack-slack) channel.
diff --git a/Chart.yaml → charts/rudderstack/Chart.yaml b/Chart.yaml → charts/rudderstack/Chart.yaml
@@ -14,21 +14,19 @@ type: application
 
 # This is the chart version. This version number should be incremented each time you make changes
 # to the chart and its templates, including the app version.
-version: 0.3.0
+version: 0.4.9
 
 # This is the version number of the application being deployed. This version number should be
 # incremented each time you make changes to the application.
 appVersion: 1.16.0
 
-# WIP
-#dependencies: 
-#  - name: nginx-ingress
-#    version: ~1.6.0
-#    repository: https://helm.nginx.com/stable
-#    condition: (optional) A yaml path that resolves to a boolean, used for enabling/disabling charts (e.g. subchart1.enabled )
-#    tags: # (optional)
-#      - Tags can be used to group charts for enabling/disabling together
-#    enabled: (optional) Enabled bool determines if chart should be loaded
-#    import-values: # (optional)
-#      - ImportValues holds the mapping of source values to parent key to be imported. Each item can be a string or pair of child/parent sublist items.
-#    alias: (optional) Alias usable alias to be used for the chart. Useful when you have to add the same chart multiple times
+dependencies:
+  - name: transformer
+    version: 0.1.0
+    repository: file://../subcharts/transformer
+    condition: transformer.enabled
+
+  - name: postgresql
+    version: 7.7.2
+    repository: file://../subcharts/postgresql
+    condition: postgresql.statefulset_enabled
diff --git a/pg_hba.conf → charts/rudderstack/pg_hba.conf b/pg_hba.conf → charts/rudderstack/pg_hba.conf
diff --git a/postgresql.conf → charts/rudderstack/postgresql.conf b/postgresql.conf → charts/rudderstack/postgresql.conf
diff --git a/charts/rudderstack/pre-stop.sh b/charts/rudderstack/pre-stop.sh
@@ -0,0 +1,50 @@
+#!/bin/sh
+START=$(date +%s.%N)
+
+# Install JQ
+JQ=/usr/bin/jq
+curl -sLo $JQ https://stedolan.github.io/jq/download/linux64/jq
+chmod +x $JQ
+
+# Extract source IDs
+SOURCE_IDS=$(more $RSERVER_BACKEND_CONFIG_CONFIG_JSONPATH | jq -r .sources[].id)
+STATSD_PREFIX="rudder_server_pre_stop"
+
+# Emit Start datum to Telegraf
+echo "${STATSD_PREFIX}_start:1|c" | nc -w 1 -u localhost 8125
+
+# Check pending events per Source ID
+HAS_PENDING_EVENTS=true
+while $HAS_PENDING_EVENTS ; do
+    TOTAL_PENDING_EVENTS=0
+    echo "1. Checking pending events...."
+    echo "${STATSD_PREFIX}_check:1|c" | nc -w 1 -u localhost 8125
+
+    for source_id in $SOURCE_IDS; do
+      PENDING_EVENTS=$(curl -s --request POST --url http://localhost:8080/v1/pending-events \
+           -u ${source_id}:potato --header 'Content-Type: application/json' \
+           --data "{\"source_id\": \"${source_id}\"}" | jq .pending_events)
+      echo "2. Pending $PENDING_EVENTS events for Source [$source_id]"
+      TOTAL_PENDING_EVENTS=$(( $TOTAL_PENDING_EVENTS + $PENDING_EVENTS ))
+    done
+
+    echo "3. Checking result..."
+    echo "${STATSD_PREFIX}_pending_events:$TOTAL_PENDING_EVENTS|g" | nc -w 1 -u localhost 8125
+    if [[ ${TOTAL_PENDING_EVENTS} -eq 0 ]]; then
+       echo "4. No pending events"
+       HAS_PENDING_EVENTS=false
+    else
+       echo "4. Total pending events: $TOTAL_PENDING_EVENTS"
+    fi
+    echo "5. Sleeping for 5 seconds..."
+    sleep 5s
+
+done
+
+echo "${STATSD_PREFIX}_done:1|c" | nc -w 1 -u localhost 8125
+END=$(date +%s.%N)
+PRE_STOP_DURATION_SEC=$(echo "$END - $START" | bc)
+echo "${STATSD_PREFIX}_time:${PRE_STOP_DURATION_SEC}|ms" | nc -w 1 -u localhost 8125
+
+# Giving time to Telegraf to forward pre-stop metrics
+sleep 60s
diff --git a/rudder-config.yaml → charts/rudderstack/rudder-config.yaml b/rudder-config.yaml → charts/rudderstack/rudder-config.yaml
@@ -115,7 +115,7 @@ Router:
 BatchRouter:
   mainLoopSleep: 2s
   jobQueryBatchSize: 100000
-  uploadFreq: 30s
+  uploadFreqInS: 30
   warehouseServiceMaxRetryTime: 3h
   noOfWorkers: 8
   maxFailedCountForJob: 128

diff --git a/rudder-google-application-credentials.json → ...udder-google-application-credentials.json b/rudder-google-application-credentials.json → ...udder-google-application-credentials.json
diff --git a/templates/NOTES.txt → charts/rudderstack/templates/NOTES.txt b/templates/NOTES.txt → charts/rudderstack/templates/NOTES.txt