Skip to content

Commit

Permalink
Make our ship_it.yml GHA workflow resilient
Browse files Browse the repository at this point in the history
As you already know, we use Dagger for CI/CD. By default, this runs on
Fly.io (via Docker). In some cases, this can fail The last failure was
due to DNS resolution stopping to work after the Docker instance was
auto-upgraded from apps v1 -> v2 (a.k.a. Fly.io machines), e.g.
https://github.com/thechangelog/changelog.com/actions/runs/5673476702/attempts/1

As a temporary fix, we had to delete some secrets and re-run the job.
The job ran on GHA free runners & failed for genuine reasons
6 mins later:
https://github.com/thechangelog/changelog.com/actions/runs/5673476702/job/15395264391

While running on the free GHA runners is ~3.5x slower (11mins vs 3mins),
it's a good fall-back. You heard us mention on multiple occasions:
"always have redundancies in place". Since we already have multiple CI
runtimes in place, let's make our GHA workflow resilient by:
- Run on our preferred back-end by default (Dagger on Fly.io)
  - If it succeeds, we are done
  - If it fails, fall-back to running on the free runner
- In forks, use GitHub by default (this will be slow, but it will work)

While this means that a workflow which fails for genuine reasons will
fail twice for us (1. Dagger on Fly.io 2. Dagger on GitHub), it seems
like a better place to improve from.

This change goes one step further. We are using a third back-end: Dagger
on K8s. This uses a self-hosted GitHub runner on K8s which is already
integrated with Dagger. For now, we are using it just to see how does
the CI part compares to our primary setup. We are not using Dagger on
K8s to deploy the app. Let's see how this setup behaves long-term before
taking it further.

Part of this, we also beefed up how we check for Fly.io connectivity.

Signed-off-by: Gerhard Lazu <[email protected]>
  • Loading branch information
gerhard committed Jul 30, 2023
1 parent 90906b2 commit 921a242
Show file tree
Hide file tree
Showing 4 changed files with 169 additions and 51 deletions.
69 changes: 69 additions & 0 deletions .github/workflows/dagger_on_fly_docker.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
name: "Dagger on Fly.io Docker"

on:
workflow_call:
secrets:
FLY_WIREGUARD:
required: true

jobs:
run:
runs-on: ubuntu-latest
steps:
- name: "Checkout code..."
uses: actions/checkout@v3

# ⚠️ FLY_WIREGUARD is configured via `fly wireguard create ...` - see 2022.fly/docker/README.md
- name: "Set up WireGuard for Fly.io Docker Engine..."
run: |
echo "🔒 Install WireGuard & friends..."
sudo DEBIAN_FRONTEND=noninteractive apt-get install -yq --no-install-recommends wireguard-tools openresolv
echo "🔐 Configure WireGuard tunnel..."
printf "${{ secrets.FLY_WIREGUARD }}" | sudo tee /etc/wireguard/fly.conf
sudo wg-quick up fly
echo "🩻 Check IPv6 routes..."
sudo ip -6 route list
echo "🩻 Check DNS resolution..."
sudo resolvconf -v
- name: "Check remote Docker Engine..."
env:
DOCKER_ENGINE_HOST: ${{ vars.DOCKER_ENGINE_HOST }}
run: |
echo "🤨 Can we resolve ${DOCKER_ENGINE_HOST:?must be set} IPv6?"
dig +short "$DOCKER_ENGINE_HOST" AAAA
echo "🤨 Can we ping $DOCKER_ENGINE_HOST IPv6?"
ping6 -c 3 "$(dig +short $DOCKER_ENGINE_HOST AAAA)"
echo "🤨 Can we ping $DOCKER_ENGINE_HOST FQDN?"
ping6 -c 3 "$DOCKER_ENGINE_HOST"
echo "🤨 Can we connect to Docker running on $DOCKER_ENGINE_HOST?"
nc -vz6 "$DOCKER_ENGINE_HOST" 2375
- uses: actions/setup-go@v4
with:
go-version: "1.20"
cache-dependency-path: "magefiles/go.sum"

- name: "Build, test, publish & deploy..."
id: cicd
env:
DOCKER_HOST: "${{ vars.DOCKER_ENGINE_HOST_FQDN }}"
IMAGE_OWNER: "${{ vars.IMAGE_OWNER }}"
GHCR_USERNAME: "${{ github.actor }}"
GHCR_PASSWORD: "${{ secrets.GHCR_PASSWORD }}"
FLY_API_TOKEN: ${{ secrets.FLY_API_TOKEN }}
AWS_ACCESS_KEY_ID: "${{ secrets.AWS_ACCESS_KEY_ID }}"
AWS_SECRET_ACCESS_KEY: "${{ secrets.AWS_SECRET_ACCESS_KEY }}"
run: |
cd magefiles
go run main.go -w ../ ci cd
- name: "Announce deploy in #dev Slack..."
if: ${{ github.repository == 'thechangelog/changelog.com' && github.ref_name == 'master' }}
uses: rtCamp/action-slack-notify@v2
env:
MSG_MINIMAL: "commit,actions url"
SLACK_CHANNEL: dev
SLACK_USERNAME: "GitHub Actions"
SLACK_FOOTER: "Just got shipped to https://changelog.com"
SLACK_WEBHOOK: ${{ secrets.SLACK_WEBHOOK }}
38 changes: 38 additions & 0 deletions .github/workflows/dagger_on_github.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
name: "Dagger on GitHub"

on:
workflow_call:

jobs:
run:
runs-on: ubuntu-latest
steps:
- name: "Checkout code..."
uses: actions/checkout@v3

- uses: actions/setup-go@v4
with:
go-version: "1.20"
cache-dependency-path: "magefiles/go.sum"

- name: "Build, test, publish & deploy..."
env:
IMAGE_OWNER: "${{ vars.IMAGE_OWNER }}"
GHCR_USERNAME: "${{ github.actor }}"
GHCR_PASSWORD: "${{ secrets.GHCR_PASSWORD }}"
FLY_API_TOKEN: ${{ secrets.FLY_API_TOKEN }}
AWS_ACCESS_KEY_ID: "${{ secrets.AWS_ACCESS_KEY_ID }}"
AWS_SECRET_ACCESS_KEY: "${{ secrets.AWS_SECRET_ACCESS_KEY }}"
run: |
cd magefiles
go run main.go -w ../ ci cd
- name: "Announce deploy in #dev Slack..."
if: ${{ github.repository == 'thechangelog/changelog.com' && github.ref_name == 'master' }}
uses: rtCamp/action-slack-notify@v2
env:
MSG_MINIMAL: "commit,actions url"
SLACK_CHANNEL: dev
SLACK_USERNAME: "GitHub Actions"
SLACK_FOOTER: "Just got shipped to https://changelog.com"
SLACK_WEBHOOK: ${{ secrets.SLACK_WEBHOOK }}
32 changes: 32 additions & 0 deletions .github/workflows/dagger_on_k8s.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
name: "Dagger on K8s"

on:
workflow_call:

jobs:
run:
runs-on: self-hosted
continue-on-error: true
steps:
- name: "Checkout code..."
uses: actions/checkout@v3

- uses: actions/setup-go@v4
with:
go-version: "1.20"
cache-dependency-path: "magefiles/go.sum"

- name: "Build, test, publish & deploy..."
env:
IMAGE_OWNER: "${{ vars.IMAGE_OWNER }}"
GHCR_USERNAME: "${{ github.actor }}"
GHCR_PASSWORD: "${{ secrets.GHCR_PASSWORD }}"
FLY_API_TOKEN: ${{ secrets.FLY_API_TOKEN }}
AWS_ACCESS_KEY_ID: "${{ secrets.AWS_ACCESS_KEY_ID }}"
AWS_SECRET_ACCESS_KEY: "${{ secrets.AWS_SECRET_ACCESS_KEY }}"
run: |
cd magefiles
go run main.go -w ../ ci
# TODO: run this in Dagger
# - name: "Announce deploy in #dev Slack..."
81 changes: 30 additions & 51 deletions .github/workflows/ship_it.yml
Original file line number Diff line number Diff line change
@@ -1,5 +1,10 @@
name: "Ship It!"

concurrency:
# There should only be able one running job per repository / branch combo.
# We do not want multiple deploys running in parallel.
group: ${{ github.repository }}-${{ github.ref_name }}

on:
push:
branches:
Expand All @@ -9,58 +14,32 @@ on:
pull_request:
workflow_dispatch:

# All jobs have the same outcome. We define multiple for resiliency reasons.
jobs:
cicd:
runs-on: ubuntu-latest
steps:
- name: "Checkout code..."
uses: actions/checkout@v3

# ⚠️ FLY_WIREGUARD is configured via `fly wireguard create ...` - see 2022.fly/docker/README.md
- name: "Set up WireGuard for Fly.io Docker Engine..."
env:
FLY_WIREGUARD: ${{ secrets.FLY_WIREGUARD }}
if: "${{ env.FLY_WIREGUARD != '' }}"
run: |
sudo DEBIAN_FRONTEND=noninteractive apt-get install -yq --no-install-recommends wireguard-tools openresolv
printf "${{ secrets.FLY_WIREGUARD }}" | sudo tee /etc/wireguard/fly.conf
sudo wg-quick up fly
# In thechangelog/changelog repository (a.k.a. upstream),
# this is the preferred default:
dagger-on-fly-docker:
if: ${{ contains(vars.RUNS_ON, 'fly') }}
uses: ./.github/workflows/dagger_on_fly_docker.yml
secrets: inherit

# ⚠️ IPv6 is configured via `fly ips private` - see 2022.fly/docker/README.md
- name: "Check Fly.io Docker Engine"
env:
DOCKER_ENGINE_HOST: ${{ secrets.DOCKER_ENGINE_HOST }}
if: "${{ env.DOCKER_ENGINE_HOST != '' }}"
run: |
ping6 -c 5 "$DOCKER_ENGINE_HOST"
nc -vz6 "$DOCKER_ENGINE_HOST" 2375
# When our Fly.io setup misbehaves, we want a fallback:
dagger-on-github-fallback:
needs: dagger-on-fly-docker
if: ${{ failure() }}
uses: ./.github/workflows/dagger_on_github.yml
secrets: inherit

- uses: actions/setup-go@v4
with:
go-version: "1.20"
- name: "Build, test, publish & deploy..."
env:
DOCKER_HOST: "${{ secrets.DOCKER_ENGINE_HOST_FQDN }}"
IMAGE_OWNER: "${{ secrets.IMAGE_OWNER }}"
GHCR_USERNAME: "${{ github.actor }}"
GHCR_PASSWORD: "${{ secrets.GHCR_PASSWORD }}"
FLY_API_TOKEN: ${{ secrets.FLY_API_TOKEN }}
AWS_ACCESS_KEY_ID: "${{ secrets.AWS_ACCESS_KEY_ID }}"
AWS_SECRET_ACCESS_KEY: "${{ secrets.AWS_SECRET_ACCESS_KEY }}"
run: |
cd magefiles
go run main.go -w ../ ci cd
# As forks will not have access to our Fly.io,
# we fallback to GitHub default:
dagger-on-github:
if: ${{ !contains(vars.RUNS_ON, 'fly') }}
uses: ./.github/workflows/dagger_on_github.yml
secrets: inherit

notify:
if: ${{ github.repository == 'thechangelog/changelog.com' && github.ref_name == 'master' }}
needs: cicd
runs-on: ubuntu-latest
steps:
- name: "Notify Slack about deploy..."
uses: rtCamp/action-slack-notify@v2
env:
MSG_MINIMAL: "commit,actions url"
SLACK_CHANNEL: dev
SLACK_USERNAME: "GitHub Actions"
SLACK_FOOTER: "Just got shipped to https://changelog.com"
SLACK_WEBHOOK: ${{ secrets.SLACK_WEBHOOK }}
# This is an experimental job which only runs the CI part of our pipeline.
# In other words, this does not run CD, it does not deploy our app.
dagger-on-k8s:
if: ${{ contains(vars.RUNS_ON, 'k8s') }}
uses: ./.github/workflows/dagger_on_k8s.yml
secrets: inherit

0 comments on commit 921a242

Please sign in to comment.