Skip to content

Safe and Controlled GitOps Promotion Across Environments/Failure-Domains

License

Notifications You must be signed in to change notification settings

commercetools/telefonistka

Repository files navigation

Telefonistka

Telefonistka is a Github webhook server/Bot that facilitates change promotion across environments/failure domains in Infrastructure as Code(IaC) GitOps repos.

It assumes the repeatable part of your infrastucture is modeled in folders

Based on configuration in the IaC repo, the bot will open pull requests that sync components from "sourcePaths" to "targetPaths".

Providing reasonably flexible control over what is promoted to where and in what order.

A 10 minutes ArgoCon EU 2023 session describing the project:

ArgoCon EU 2023 session

Modeling environments/failure-domains in an IaC GitOps repo

RY is the new DRY!

In GitOps IaC implementations, different environments(dev/prod/...) and failure domains(us-east-1/us-west-1/...) must be represented in distinct files, folders, Git branches or even repositories to allow gradual and controlled rollout of changes across said environments/failure domains.

At Wayfair's Kubernetes team we choose the "folders" approach, more about other choices here.

Specifically, we choose the following scheme to represent all the Infrastructure components running in our Kubernetes clusters: clusters/[environment]/[cloud region]/[cluster identifier]/[component name]

for example:

clusters/staging/us-central1/c2/prometheus/
clusters/staging/us-central1/c2/nginx-ingress/
clusters/prod/us-central1/c2/prometheus/
clusters/prod/us-central1/c2/nginx-ingress/
clusters/prod/europe-west4/c2/prometheus/
clusters/prod/europe-west4/c2/nginx-ingress/

While this approach provides multiple benefits it does mean the user is expected to make changes in multiple files and folders in order to apply a single change to multiple environments/FDs.

Manually syncing those files is time consuming, error prone and generally not fun. And in the long run, undesired drift between those environments/FDs is almost guaranteed to accumulate as humans do that thing where they fail to be perfect at what they do.

This is where Telefonistka comes in.

Telefonistka will automagically create pull requests that "sync" our changes to the right folder or folders, enabling the usage of the familiar PR functionality to control promotions while avoiding the toil related to manually syncing directories and checking for environments/FDs drift.

Notable Features

IaC stack agnostic

Terraform, Helmfile, ArgoCD whatever, as long as environments and sites are modeled as folders and components are copied between environments "as is".

Unopinionated directory structure

The in-configuration file is flexible and even has some regex support.

The project goal is support any reasonable setup and we'll try to address unsupported setups.

Multi stage promotion schemes

lab -> staging -> production

or

dev -> production-us-east-1 -> production-us-east-3 -> production-eu-east-1

Fan out, like:

lab -> staging1 -->
       staging2 -->  production
       staging3 -->

Telefonistka annotates the PR with the historic "flow" of the promotion:

Control granularity of promotion PRs

Allows separating promotions into a separate PRs per environment/failure domain or group some/all of them.

e.g. "Sync all dev clusters in one PR but open a dedicated PR for every production cluster"

Also allows automatic merging of PRs based on the promotion policy.

e.g. "Automatically merge PRs that promote to multiple lab environments"

Optional per-component allow/block override list

Allows overriding the general(per-repo) promotion policy on a per component level.

e.g. "This component should not be deployed to production" or "Promote this only to the us-east-4 region"

Drift detection and warning

Warns user on drift between environment/failure domains on open PRs ("Staging and Production are not synced, these are the differences") This is how this warning looks in the PR:

ArgoCD integration

Telefonistka can compare manifests in PR branches to live objects in the clusters and comment on the difference in PRs

image

Artifact version bumping from CLI

If your IaC repo deploys software you maintain internally you probably want to automate artifact version bumping. Telefonistka can automate opening the IaC repo PR for the version change from the Code repo pipeline:

telefonistka bump-overwrite \
    --target-repo Oded-B/telefonistka-example \
    --target-file workspace/nginx/values-version.yaml \
    --file <(echo -e "image:\n  tag: v3.4.9") \

It currently supports full file overwrite, regex and yaml based replacement. See here for more details

GitHub Push events fanout/multiplexing

Some GitOps operators can listen for GitHub webhooks to ensure short delays in the reconciliation loop.

But in some scenarios the number of needed webhooks endpoint exceed the maximum supported by GitHub(think 10 cluster each with in-cluster ArgoCD server and ArgoCD applicationSet controller).

Telefonistka can forward these HTTP requests to multiple endpoint and can even filter or dynamically choose the endpoint URL based on the file changed in the Commit.

This example configuration includes regex bases endpoint URL generation:

webhookEndpointRegexs:
  - expression: "^workspace/[^/]*/.*"
    replacements:
      - "https://kube-argocd-c1.service.lab.example.com/api/webhoook"
      - "https://kube-argocd-applicationset-c1.service.lab.example.com/api/webhoook"
      - "https://example.com"
  - expression: "^clusters/([^/]*)/([^/]*)/([^/]*)/.*"
    replacements:
      - "https://kube-argocd-${3}.${1}.service.{2}.example.com/api/webhoook"
      - "https://kube-argocd-applicationset-${2}.service.${1}.example.com/api/webhoook"

see here for more details

Installation and Configuration

See here

Observability

See here

Development

Local Testing

Telefonistka have 3 major methods to interact with the world:

  • Receive event webhooks from GitHub
  • Send API calls to GitHub REST and GraphQL APIs(requires network access and credentials)
  • Send API calls to ArgoCD API(requires network access and credentials)

Supporting all those requirements in a local environment might require lots of setup. Assuming you have a working lab environment, the easiest way to locally test Telefonistka might be with tools like mirrord or telepresence

A mirrord.json is supplied as reference.

This is how I compile and trigger mirrord execution

go build . && mirrord exec -f mirrord.json ./telefonistka server

Alternatively, you can use ngrok or similar services to route webhook to a local instance, but you still need to provide credentials to all outbound API calls.

  • use Ngrok ( ngrok http 8080 ) to expose the local instance
  • See the URLs in ngrok command output.
  • Add a webhook to repo setting (don't forget the /webhook path in the URL).
  • Content type needs to be application/json, currently only PR events are needed

Building Container Image From Forks

To publish container images from a forked repo set the IMAGE_NAME and REGISTRY GitHub Action Repository variables to use GitHub packages. REGISTRY should be ghcr.io and IMAGE_NAME should match the repository slug, like so: like so:

image

Roadmap

See the open issues for a list of proposed features (and known issues).

Contributing

Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated. For detailed contributing guidelines, please see CONTRIBUTING.md

License

Distributed under the MIT License. See LICENSE for more information.