Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ci: Refactor Dockerfile & entrypoint #8923

Open
wants to merge 29 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
64be0ae
Refactor formatting & docs
upbqdn Oct 2, 2024
a248b14
Refactor the `runtime` stage in Dockerfile
upbqdn Oct 2, 2024
ca1620c
Remove unused code from `entrypoint.sh`
upbqdn Oct 9, 2024
ea8b119
Simplify `entrypoint.sh` setup
upbqdn Oct 9, 2024
7cba8cf
Revise docs & formatting
upbqdn Oct 9, 2024
56c65e6
Adjust default values for env vars
upbqdn Oct 9, 2024
be38132
Bump Rust v from 1.79 to 1.81 in Dockerfile
upbqdn Oct 9, 2024
0492b7a
Refactor `entrypoint.sh`
upbqdn Oct 10, 2024
6595740
Refactor `Dockerfile`
upbqdn Oct 10, 2024
22dc738
Add TODOs for monitoring stage to Dockerfile
upbqdn Oct 10, 2024
962d4d3
Merge branch 'main' into docker-refactor
upbqdn Oct 10, 2024
621754b
Refactor `Dockerfile`
upbqdn Oct 10, 2024
6b68592
Add TODOs for monitoring stage to Dockerfile
upbqdn Oct 10, 2024
c788ccc
Merge branch 'docker-refactor' of github.com:ZcashFoundation/zebra in…
upbqdn Oct 10, 2024
38837e4
Fix a typo
upbqdn Oct 10, 2024
b58a602
Merge branch 'main' into docker-refactor
upbqdn Oct 10, 2024
2ada296
Allow running `zebrad` in test mode
upbqdn Oct 11, 2024
99cd18f
Merge branch 'docker-refactor' of github.com:ZcashFoundation/zebra in…
upbqdn Oct 11, 2024
c718609
Merge branch 'main' into docker-refactor
upbqdn Oct 11, 2024
69b03d4
Allow custom config for `zebrad` in test mode
upbqdn Oct 11, 2024
6932d9a
Remove `curl` from the `runtime` Docker image
upbqdn Oct 11, 2024
6fe460d
Remove redundant echos
upbqdn Oct 11, 2024
c5010b8
Remove a malfunctioning CD test
upbqdn Oct 12, 2024
e05df78
Remove a redundant CI test
upbqdn Oct 12, 2024
e9f0479
Remove all packages from the `runtime` stage
upbqdn Oct 12, 2024
7422ecf
Merge branch 'main' into docker-refactor
upbqdn Oct 14, 2024
4fa064c
Docs cosmetics
upbqdn Oct 14, 2024
afeb05f
Merge branch 'main' into docker-refactor
upbqdn Oct 18, 2024
2a72a07
Merge branch 'main' into docker-refactor
upbqdn Jan 10, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
127 changes: 61 additions & 66 deletions .github/workflows/cd-deploy-nodes-gcp.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Google Cloud node deployments and tests that run when Rust code or dependencies are modified,
# but only on PRs from the ZcashFoundation/zebra repository.
# (External PRs are tested/deployed by GitHub's Merge Queue.)
# (External PRs are tested/deployed by GitHub's Merge Queue.)
#
# 1. `versioning`: Extracts the major version from the release semver. Useful for segregating instances based on major versions.
# 2. `build`: Builds a Docker image named `zebrad` with the necessary tags derived from Git.
Expand Down Expand Up @@ -34,79 +34,79 @@ on:
inputs:
network:
default: Mainnet
description: 'Network to deploy: Mainnet or Testnet'
description: "Network to deploy: Mainnet or Testnet"
required: true
type: choice
options:
- Mainnet
- Testnet
cached_disk_type:
default: tip
description: 'Type of cached disk to use'
description: "Type of cached disk to use"
required: true
type: choice
options:
- tip
- checkpoint
prefer_main_cached_state:
default: false
description: 'Prefer cached state from the main branch'
description: "Prefer cached state from the main branch"
required: false
type: boolean
no_cached_disk:
default: false
description: 'Do not use a cached state disk'
description: "Do not use a cached state disk"
required: false
type: boolean
no_cache:
description: 'Disable the Docker cache for this build'
description: "Disable the Docker cache for this build"
required: false
type: boolean
default: false
log_file:
default: ''
description: 'Log to a file path rather than standard output'
default: ""
description: "Log to a file path rather than standard output"

push:
# Skip main branch updates where Rust code and dependencies aren't modified.
branches:
- main
paths:
# code and tests
- '**/*.rs'
# hard-coded checkpoints and proptest regressions
- '**/*.txt'
# dependencies
- '**/Cargo.toml'
- '**/Cargo.lock'
# configuration files
- '.cargo/config.toml'
- '**/clippy.toml'
# workflow definitions
- 'docker/**'
- '.dockerignore'
- '.github/workflows/cd-deploy-nodes-gcp.yml'
- '.github/workflows/sub-build-docker-image.yml'
# Skip main branch updates where Rust code and dependencies aren't modified.
branches:
- main
paths:
# code and tests
- "**/*.rs"
# hard-coded checkpoints and proptest regressions
- "**/*.txt"
# dependencies
- "**/Cargo.toml"
- "**/Cargo.lock"
# configuration files
- ".cargo/config.toml"
- "**/clippy.toml"
# workflow definitions
- "docker/**"
- ".dockerignore"
- ".github/workflows/cd-deploy-nodes-gcp.yml"
- ".github/workflows/sub-build-docker-image.yml"

# Only runs the Docker image tests, doesn't deploy any instances
pull_request:
# Skip PRs where Rust code and dependencies aren't modified.
paths:
# code and tests
- '**/*.rs'
- "**/*.rs"
# hard-coded checkpoints and proptest regressions
- '**/*.txt'
- "**/*.txt"
# dependencies
- '**/Cargo.toml'
- '**/Cargo.lock'
- "**/Cargo.toml"
- "**/Cargo.lock"
# configuration files
- '.cargo/config.toml'
- '**/clippy.toml'
- ".cargo/config.toml"
- "**/clippy.toml"
# workflow definitions
- 'docker/**'
- '.dockerignore'
- '.github/workflows/cd-deploy-nodes-gcp.yml'
- '.github/workflows/sub-build-docker-image.yml'
- "docker/**"
- ".dockerignore"
- ".github/workflows/cd-deploy-nodes-gcp.yml"
- ".github/workflows/sub-build-docker-image.yml"

release:
types:
Expand Down Expand Up @@ -183,11 +183,11 @@ jobs:
needs: build
uses: ./.github/workflows/sub-test-zebra-config.yml
with:
test_id: 'default-conf'
test_id: "default-conf"
docker_image: ${{ vars.GAR_BASE }}/zebrad@${{ needs.build.outputs.image_digest }}
grep_patterns: '-e "net.*=.*Main.*estimated progress to chain tip.*BeforeOverwinter"'
test_variables: '-e NETWORK'
network: 'Mainnet'
test_variables: "-e NETWORK"
network: "Mainnet"

# Test reconfiguring the docker image for testnet.
test-configuration-file-testnet:
Expand All @@ -196,23 +196,11 @@ jobs:
# Make sure Zebra can sync the genesis block on testnet
uses: ./.github/workflows/sub-test-zebra-config.yml
with:
test_id: 'testnet-conf'
test_id: "testnet-conf"
docker_image: ${{ vars.GAR_BASE }}/zebrad@${{ needs.build.outputs.image_digest }}
grep_patterns: '-e "net.*=.*Test.*estimated progress to chain tip.*Genesis" -e "net.*=.*Test.*estimated progress to chain tip.*BeforeOverwinter"'
test_variables: '-e NETWORK'
network: 'Testnet'

# Test that Zebra works using $ZEBRA_CONF_PATH config
test-zebra-conf-path:
name: Test CD custom Docker config file
needs: build
uses: ./.github/workflows/sub-test-zebra-config.yml
with:
test_id: 'custom-conf'
docker_image: ${{ vars.GAR_BASE }}/zebrad@${{ needs.build.outputs.image_digest }}
grep_patterns: '-e "loaded zebrad config.*config_path.*=.*v1.0.0-rc.2.toml"'
test_variables: '-e NETWORK -e ZEBRA_CONF_PATH="zebrad/tests/common/configs/v1.0.0-rc.2.toml"'
network: ${{ inputs.network || vars.ZCASH_NETWORK }}
test_variables: "-e NETWORK"
network: "Testnet"

# Deploy Managed Instance Groups (MiGs) for Mainnet and Testnet,
# with one node in the configured GCP region.
Expand All @@ -234,14 +222,21 @@ jobs:
matrix:
network: [Mainnet, Testnet]
name: Deploy ${{ matrix.network }} nodes
needs: [ build, versioning, test-configuration-file, test-zebra-conf-path, get-disk-name ]
needs:
[
build,
versioning,
test-configuration-file,
test-zebra-conf-path,
get-disk-name,
]
runs-on: ubuntu-latest
timeout-minutes: 60
env:
CACHED_DISK_NAME: ${{ needs.get-disk-name.outputs.cached_disk_name }}
permissions:
contents: 'read'
id-token: 'write'
contents: "read"
id-token: "write"
if: ${{ !cancelled() && !failure() && ((github.event_name == 'push' && github.ref_name == 'main') || github.event_name == 'release') }}

steps:
Expand Down Expand Up @@ -270,8 +265,8 @@ jobs:
id: auth
uses: google-github-actions/[email protected]
with:
workload_identity_provider: '${{ vars.GCP_WIF }}'
service_account: '${{ vars.GCP_DEPLOYMENTS_SA }}'
workload_identity_provider: "${{ vars.GCP_WIF }}"
service_account: "${{ vars.GCP_DEPLOYMENTS_SA }}"

- name: Set up Cloud SDK
uses: google-github-actions/[email protected]
Expand Down Expand Up @@ -344,14 +339,14 @@ jobs:
# Note: this instances are not automatically replaced or deleted
deploy-instance:
name: Deploy single ${{ inputs.network }} instance
needs: [ build, test-configuration-file, test-zebra-conf-path, get-disk-name ]
needs: [build, test-configuration-file, test-zebra-conf-path, get-disk-name]
runs-on: ubuntu-latest
timeout-minutes: 30
env:
CACHED_DISK_NAME: ${{ needs.get-disk-name.outputs.cached_disk_name }}
permissions:
contents: 'read'
id-token: 'write'
contents: "read"
id-token: "write"
# Run even if we don't need a cached disk, but only when triggered by a workflow_dispatch
if: ${{ !failure() && github.event_name == 'workflow_dispatch' }}

Expand Down Expand Up @@ -381,8 +376,8 @@ jobs:
id: auth
uses: google-github-actions/[email protected]
with:
workload_identity_provider: '${{ vars.GCP_WIF }}'
service_account: '${{ vars.GCP_DEPLOYMENTS_SA }}'
workload_identity_provider: "${{ vars.GCP_WIF }}"
service_account: "${{ vars.GCP_DEPLOYMENTS_SA }}"

- name: Set up Cloud SDK
uses: google-github-actions/[email protected]
Expand Down Expand Up @@ -423,7 +418,7 @@ jobs:
failure-issue:
name: Open or update issues for release failures
# When a new job is added to this workflow, add it to this list.
needs: [ versioning, build, deploy-nodes, deploy-instance ]
needs: [versioning, build, deploy-nodes, deploy-instance]
# Only open tickets for failed or cancelled jobs that are not coming from PRs.
# (PR statuses are already reported in the PR jobs list, and checked by GitHub's Merge Queue.)
if: (failure() && github.event.pull_request == null) || (cancelled() && github.event.pull_request == null)
Expand Down
13 changes: 0 additions & 13 deletions .github/workflows/sub-ci-unit-tests-docker.yml
Original file line number Diff line number Diff line change
Expand Up @@ -151,19 +151,6 @@ jobs:
test_variables: '-e NETWORK'
network: 'Mainnet'

# Test reconfiguring the the docker image for tesnet.
test-configuration-file-testnet:
name: Test CI testnet Docker config file
# Make sure Zebra can sync the genesis block on testnet
uses: ./.github/workflows/sub-test-zebra-config.yml
with:
test_id: 'testnet-conf'
docker_image: ${{ vars.GAR_BASE }}/${{ vars.CI_IMAGE_NAME }}@${{ inputs.image_digest }}
grep_patterns: '-e "net.*=.*Test.*estimated progress to chain tip.*Genesis" -e "net.*=.*Test.*estimated progress to chain tip.*BeforeOverwinter"'
# TODO: improve the entrypoint to avoid using `ENTRYPOINT_FEATURES=""`
test_variables: '-e NETWORK -e ZEBRA_CONF_PATH="/etc/zebrad/zebrad.toml" -e ENTRYPOINT_FEATURES=""'
network: 'Testnet'

Comment on lines -154 to -166
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test was created to confirm that any change we do in CI or in Docker doesn't affect the ability to read the proper $NETWORK environment variable. As it had happened before that some changes breaks this behavior, and then the tests are running in Mainnet instead of Testnet, but we realized too late or had to wait for some tests to run to confirm it.

Copy link
Member Author

@upbqdn upbqdn Oct 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is failing with this PR, similarly to the other one. I have a better approach in mind, which I didn't do in this PR. Let's add it back in a subsequent PR?

Moreover, there seem to be some parts that are bugs or hard to understand. For example, I couldn't figure out what -e NETWORK in test_variables is supposed to do. Also, setting ENTRYPOINT_FEATURES to the empty string to enable the test in the entrypoint makes it very hard to follow the execution path in the whole pipeline.

Copy link
Member

@gustavovalverde gustavovalverde Oct 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-e NETWORK tells docker to use whatever value the $NETWORK env variable is set to, to override any default that was set at build time, or in the Dockerfile. This happens here:

env:
NETWORK: '${{ inputs.network }}'
and makes Zebra run with a Testnet configuration, and the test validates that Zebra is correctly running with it.

This has saved me (and others) from making mistakes multiple times while making CI refactors, so this is very important under those circumstances.

I do agree that setting the ENTRYPOINT_FEATURES to an empty string is a dirty hack to make this work, but that's a tech debt that wouldn't justify removing the whole test. In any case, we can remove the use of the ENTRYPOINT_FEATURES variables, while keeping this test behavior.

I'd suggest commenting this and adding a TODO in top of it, or creating an issue, instead of removing the test. Just so we don't forget later on, as this is an important validation.

# Test that Zebra works using $ZEBRA_CONF_PATH config
test-zebra-conf-path:
name: Test CI custom Docker config file
Expand Down
Loading
Loading