Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Actions and input needed towards CI/CD for upcoming release #484

Closed
24 of 25 tasks
consideRatio opened this issue Feb 26, 2022 · 28 comments
Closed
24 of 25 tasks

Actions and input needed towards CI/CD for upcoming release #484

consideRatio opened this issue Feb 26, 2022 · 28 comments

Comments

@consideRatio
Copy link
Collaborator

consideRatio commented Feb 26, 2022

Background

Towards making a release (#381), it would be great to have automation for various things setup.

While working towards automation to build/push docker images to ghcr.io, I realize it ties into automation of publishing the Helm chart as well. I'm trying to plan ahead and raise questions for input as I'd like to solve many goals and not just pushing the images.

Making goals explicit

Important goals

  1. Release automation
    To have git ops based publishing of artifacts when we push a git tag.

Bonus goals

  1. Development release automation
    To have git obs based publishing of some artifacts when we push to main branch:
    • Docker images (ci: build/push images workflow, added #493)
    • Helm chart
      With this in place, it would make it far easier for me to for example make a manual end-to-end test of the Helm chart before making a release in a production deployment setting.

Action points

There are plenty of action points that I lack permissions to do, but I've outlined below what I think should be done and how to do it for anyone with relevant permissions. If you are a Dask GitHub organization admin you can do a lot of the steps below!

  • PyPI release preparations
    We need deployment keys for both PyPI projects available to this repo's GitHub workflows (dask-gateway, dask-gateway-server). This can only be done by a PyPI project maintainer, currently that must be @jcrist.
  • conda-forge release preparations
    Following a release has been made to PyPI, an automated PR will show up at conda-forge/dask-gateway-feedstock. Currently only @jcrist among the Dask team has permissions to merge those. Adding more maintainers would be good.
  • dask/helm-chart release preparations
    • Alternative strategy option:
      To publish the dask-gateway Helm chart, we rely on the dask/helm-chart repo's gh-pages branch. So, we will need a GitHub repo deployment key for that setup as a GitHub actions secret in this repo.
      1. Generating a private/public key pair:
        ssh-keygen -t ed25519 -C "dask/dask-gateway" -f /tmp/id_ed25519
        
      2. Registering the private key (/tmp/id_ed25519) as a secret named "gh_deploy_key__dask_helm_chart" for this repo in https://github.com/dask/dask-gateway/settings/secrets/actions
      3. Registering the public key (/tmp/id_ed25519.pub) as a deploy key with push rights for the dask/helm-chart repo at https://github.com/dask/helm-chart/settings/keys
    • Alternative strategy option:
      To make available a Private Access Token (PAT) from an account with access to push to the dask/helm-chart repo, such as from the dask bot account.
  • ghcr.io release preparation
    • BLOCKING UPSTREAM: It appears that the instructions at https://docs.github.com/en/packages/learn-github-packages/configuring-a-packages-access-control-and-visibility#inheriting-access-for-a-container-image-from-a-repository are outdated, and I've failed to manage to push a public image to ghcr.io. I will investigate this further.
      UPDATE: This was resolved, the UI had changed, but there was a "package settings" after all but a bit tricky to find.
    • Publish a dummy ghcr.io/dask/dask-gateway and ghcr.io/dask-gateway-server container manually so we can configure settings.
      1. You must be a Dask organization admin
      2. You must create a personal access token (PAT) for your account with write:packages permissions at https://github.com/settings/tokens/new.
      3. Login to the ghcr.io container registry with the PAT:
        docker login ghcr.io -u your-username
        
      4. Push a dummy image with a label to couple it with a github repo and enable us to configure permissions
        echo "FROM alpine" > Dockerfile
        docker build . --tag ghcr.io/dask/dask-gateway --label org.opencontainers.image.source=https://github.com/dask-gateway
        docker build . --tag ghcr.io/dask/dask-gateway-server --label org.opencontainers.image.source=https://github.com/dask-gateway-server
        docker push ghcr.io/dask/dask-gateway
        docker push ghcr.io/dask/dask-gateway-server
        
      5. Declare permissions/access for each container package
        For each container package: a) grant permissions to github actions to push to it, b) declare that access should be inherited, c) declare that it should be public. All settings should be found in the same page.

UPDATE: Removed Quay steps as we are no longer going down that road.

Input wanted for decisions

  • To git ops automate publishing to PyPI
    We've done it across the JupyterHub ecosystem and I personally love it because it makes cutting regular releases far more sustainable for us.

    • Decision: ok to go for it here?
  • To not pin image versions in Helm chart
    If we pin the versions images we reference in our helm chart, we will need to push git changes about that whenever they should change. I've thought a lot about this and concluded that if we managed the helm chart in a separate repository from the images, this would be fine. But, when they are part of the same, we should let them have placeholder values and let skaffold or chartpress update the image tags just before packaging and publishing a Helm chart.

    • Decision: ok to remove hard-coded references in the helm chart and rely on skaffold and/or chartpress?
  • To enable jupyterhub/chartpress to build images, while retaining support for using skaffold for local development experiences
    I recall skaffold brings some local development experiences to the table, while chartpress brings some publishing experiences to the table, and that in the past we have relied on chartpress for publishing the Helm chart - but not the images.

    The Helm chart should always be updated if a new image is to be published, so instead of having images build separate from the Helm chart in an automated CI system, I suggest the strategy to let chartpress do both. I want to be clear that this would be to make chartpress take on so far never automated responsibility, it wouldn't imply removing any local development use of skaffold etc.

    I have not kept up to date with skaffolds possibilities, but I would prefer to stick with what I know will work at this point rather than re-evaluating again if it would be possible for skaffold to take on the full responsibility of the tasks I want us to get done. I know that chartpress has evolved to meet all the requirements I see it needs, such as: a) building multi-architecture images, b) packaging and publishing to a Helm chart repo in a gh-pages branch, updating the chart's files Chart.yaml and values.yaml with the latest chart version and image versions.

    • Decision: ok to expand responsibility of chartpress to also build and publish images, along with the Helm chart they couple to?
@jacobtomlinson
Copy link
Member

Thanks for putting this together @consideRatio.

I'm personally not in favour of using Quay or GitHub Containers as part of this release. We have an ongoing discussion in dask/dask-docker#177 about where our Docker images should live and I don't want to supplant that by having Dask Gateway make decisions now.

My preference for the time being would be to continue using Docker Hub, despite the limitations, and continue the discussion about moving ALL Dask images to another platform, whether that is GitHub, Quay or something else.

I would be happy to make that decision now and migrate the Dask and Dask Notebook images somewhere else as part of this, but that adds even more complexity to this already crazy complex release.

To answer your direct questions:

Automate pushing to PyPI

I'm always in favour of this, and many Dask projects already publish automatically on tags. I would be pleased to see dask-gateway adopt this behaviour.

Not pin image versions in helm chart

If pins are removed what happens if a user tried to install an older version of the helm chart? Would they get latest Docker images and potentially fail?

Using chartpress for docker builds in addition to helm chart

Given that the docker images and helm chart are tightly coupled here and that chartpress supports all the features we would want then that sounds good to me.

@jacobtomlinson
Copy link
Member

I also want to raise that there has been no mention here of pushing Docker images to the existing daskgateway org on Docker Hub. I understand that the Dask Gateway docker images are coupled to the helm chart in a way that the core Dask images are not, but I want to avoid just abandoning those images. We should either continue pushing to them, or push a new image which exits with a deprecation error and link to the new images.

@consideRatio
Copy link
Collaborator Author

Thanks @jacobtomlinson for working with me on this!

I'm personally not in favour of using Quay or GitHub Containers as part of this release. We have an ongoing discussion in dask/dask-docker#177 about where our Docker images should live and I don't want to supplant that by having Dask Gateway make decisions now.

Fully agree, we stick with ghcr.io. Quay.io mentioned here was a distraction after having failed to setup ghcr.io, but those have been been resolved.

My preference for the time being would be to continue using Docker Hub [...]

I wrote dask/dask-docker#177 (comment) related to this, I figure it's better to not continue the discussion here. The gist: given credentials for github actions, I can make us push to dockerhub, but this repo doesn't have credentials setup currently for github actions.


If pins are removed what happens if a user tried to install an older version of the helm chart? Would they get latest Docker images and potentially fail?

No problem at all, version pinning will remain in the packaged helm charts, but in version control the version will have it set to a malfunctional version. Whenever we would package the chart for distribution, we would first automatically set valid versions. For reference, see the z2jh's git repo's unset versions.

@jacobtomlinson
Copy link
Member

jacobtomlinson commented Mar 2, 2022

Thanks @consideRatio sounds good. I'll take a look at working through some of the GHCR steps.

Also I think the Quay account you invited is an old corporate one that I don't have access to any more. I'll make a new one and see if we can invite that instead. Even if we are going to not use it I agree we should have control over the name. I recovered the account and have accepted.

@jacobtomlinson
Copy link
Member

jacobtomlinson commented Mar 2, 2022

@consideRatio

I've completed the GitHub Container Registry steps. Thanks for writing those out, made life so easy!

I'm also happy with the three decisions, given that this has been open for a few days lets move forward with lazy concensus.

For the Helm release I expect you should be able to use the DASK_BOT_TOKEN secret PAT that is enabled on this org to publish to the Helm Chart gh-pages branch and do not need a deploy secret.

I've put a strike through the quay section as we are no longer going down that path. However any Dask org owners who want to be added to the parked org on Quay can still ask @consideRatio or myself for access.

@martindurant
Copy link
Member

any Dask org owners who want to be added to the parked org on Quay

no rush, only if it becomes useful :)

@consideRatio
Copy link
Collaborator Author

Great progress!!

@jacobtomlinson I note that I made a mistake in my ghcr.io setup step, but also that that it didn't seem to have caused a problem for you. All ghcr.io image repositories/packages are correctly coupled with the github repository with the source code that defined them!

# in this step i made a typo in the label that couples
# this image repo with the github repo
docker build . --tag ghcr.io/dask/dask-gateway-server --label org.opencontainers.image.source=https://github.com/dask-gateway-server

@jacobtomlinson
Copy link
Member

Yeah I missed the mistake at first and went back and linked things up via the UI.

@jacobtomlinson
Copy link
Member

@consideRatio do you have thoughts on this?

For the Helm release I expect you should be able to use the DASK_BOT_TOKEN secret PAT that is enabled on this org to publish to the Helm Chart gh-pages branch and do not need a deploy secret.

@consideRatio
Copy link
Collaborator Author

For the Helm release I expect you should be able to use the DASK_BOT_TOKEN secret PAT that is enabled on this org to publish to the Helm Chart gh-pages branch and do not need a deploy secret.

I haven't become confident about what i think, but this is what I know.

  1. With a PAT, you can trigger workflows when pushing.
  2. With a deployment key, you can also trigger workflows when pushing I think - but not with a secrets.GITHUB_TOKEN provided to all GitHub workflows by default.
  3. Triggering workflows when we push to the gh-pages branch of dask/helm-charts is not relevant.
  4. PATs have greater permissions than they need for specific tasks such as pushing to a specific github repo, which is the permissions we can grant a github repo deploy SSH key.
  5. Using a github repo deploy SSH key, I'm confident on the chartpress integration will work without me needing to make changes etc by just re-using logic from z2jh and elsewhere, but with a PAT I may need to do some extra work. There has been rabbit-hole like struggles about using github actions provided GITHUB_TOKEN in the past I know, but they are perhaps different to a PAT to GitHub?

Overall, if you have no explicit preference towards going for using the DASK_BOT_TOKEN PAT, then I'd suggest you go for setting up a scoped github deploy SSH key that I can rely on instead. But, if you have a preference - I can probably make it work with the PAT anyhow!

@jacobtomlinson
Copy link
Member

I appreciate all of those points. I have two motivations here:

  • I'm very keen to avoid committing encrypted key files to repos, they are harder to update and troubleshoot than using secrets/env vars. It feels like a practice from the bygone Travis CI days.
  • I am keen to avoid creating an additional secret when one already exists. Multiple secrets just means more scope for things going wrong.

Do you have advice on whether we could do this with just secrets? Whether that is a deploy key or the existing `DASK_BOT_TOKEN secret?

@consideRatio
Copy link
Collaborator Author

consideRatio commented Mar 7, 2022

I'm very keen to avoid committing encrypted key files to repos, they are harder to update and troubleshoot than using secrets/env vars. It feels like a practice from the bygone Travis CI days.

The GitHub repo's deploy key (SSH key) would also be a github actions secret (environment variable only available to trusted workflow executions - not PR triggered workflows for example). It would not be a dedicated file we encrypt/decrypt.

A reason for having a dedicated file in the past when travis was used was because they restricted the length of their secret environment variables in a way not done by GitHub's CI system.

I appreciate this point as well - yuck for encrypted files in repo that also can't be described with comments as well because they are encrypted!

I am keen to avoid creating an additional secret when one already exists. Multiple secrets just means more scope for things going wrong.

Would we need DASK_BOT_TOKEN at all though in this repo? I appreciate how I perceive the complexity would be reduced by solely using credentials with more limited scope than a token that I don't know how it is setup - but I guess it is a single token with plenty of permissions, and if leaked, would need to be rotated across all repos using it.

I see that it is used here, but I've not been able to find any documentation about it beyond that: https://github.com/dask/helm-chart/blob/2255ba213354dfefa98c06185a74c3e8e2445b7a/.github/workflows/watch-dependencies.yml#L66-L78. Is it a token for a github user with maintainer permissions across the dask org?

@consideRatio
Copy link
Collaborator Author

Do you have advice on whether we could do this with just secrets? Whether that is a deploy key or the existing DASK_BOT_TOKEN secret?

@jacobtomlinson these steps would get it done without needing to version control anything related to it. A final step would be to cleanup your computer removing the generated ssh key.

To publish the dask-gateway Helm chart, we rely on the dask/helm-chart repo's gh-pages branch. So, we will need a GitHub repo deployment key for that setup as a GitHub actions secret in this repo.

  1. Generating a private/public key pair:
    ssh-keygen -t ed25519 -C "dask/dask-gateway" -f /tmp/id_ed25519
    
  2. Registering the private key (/tmp/id_ed25519) as a secret named "gh_deploy_key__dask_helm_chart" for this repo in https://github.com/dask/dask-gateway/settings/secrets/actions
  3. Registering the public key (/tmp/id_ed25519.pub) as a deploy key with push rights for the dask/helm-chart repo at https://github.com/dask/helm-chart/settings/keys

@jacobtomlinson
Copy link
Member

Would we need DASK_BOT_TOKEN at all though in this repo? I appreciate how I perceive the complexity would be reduced by solely using credentials with more limited scope than a token that I don't know how it is setup - but I guess it is a single token with plenty of permissions, and if leaked, would need to be rotated across all repos using it.

I can't seem to find the issue/PR but I'm sure there was a desire to have a PAT that could trigger workflows in this repo, so I enabled it here.

but I guess it is a single token with plenty of permissions, and if leaked, would need to be rotated across all repos using it.

It's an org level secret so only needs rotating in one place. This is why I'm keen to reuse it where possible. We just enable it on repos that will benefit from it and if things go sideways rotate in a single location.

Is it a token for a github user with maintainer permissions across the dask org?

Yes, although with limited scope.

these steps would get it done without needing to version control anything related to it

Awesome!

@consideRatio
Copy link
Collaborator Author

consideRatio commented Mar 20, 2022

I found time to allocate and move onwards with this now. I opted to make many small PRs that should be independent from each other and easy to review. I think I'll want to wait a bit for these to be resolved before continuing as it will possibly start creating merge conflicts if I from now attempt to use chartpress to publish the Helm chart etc.

PRs recently opened

@consideRatio
Copy link
Collaborator Author

@jacobtomlinson I struggle overall with getting all work of relevance done and know it would add some additional testing etc to ensure things work using the daskbot PAT instead of the github deploy key for the specific repo. Would you be okay going for that solution instead (steps in #484 (comment))? I'll make sure that its use is clearly documented to ensure long term maintenance is ensured to be easy!

@consideRatio
Copy link
Collaborator Author

@jacobtomlinson I've learned about using a PAT with chartpress and I'm hopeful that it will work quickly out of the box now! Let's go with the daskhub bot PAT!

@jacobtomlinson do you have time to review/merge the CI PRs?

@jacobtomlinson
Copy link
Member

Thanks so much @consideRatio. I've approved and merged most things. I've left the PyPI one until we get unblocked on credentials though.

@jacobtomlinson
Copy link
Member

I've now merged that one too, but wanted to elevate this comment from @jcrist #494 (comment)

I'm not sure what your wheel building automation plans are, but note that dask-gateway-server has some native (go) code that will need to be fixed and checked by auditwheel. Previously I had a hacky buildscript I'd run locally for the osx & linux builds, but if I were to redo this today I'd just use cibuildwheel and be done with it. Apologies if y'all already know all this.

@consideRatio
Copy link
Collaborator Author

Thank you @jacobtomlinson!!

Compiling the wheel-bundled Go code should absolutely be done and I'll learn more follow up with that, while doing it also ensuring that there is a test before pushing to PyPI that the built wheel includes what we want.

@consideRatio
Copy link
Collaborator Author

consideRatio commented Mar 25, 2022

@jacobtomlinson this repo manages a few other Docker images solely used for CI integration tests against the non-k8s backend providers of Dask clusters (slurm, hadoop, yarn). Can you establish these images as well with the procedure taken above for the other public images? These images are only used for testing etc, but needs to be rebuilt from time to time as well. I figure for now we do it manually as has been done before as well, but that we publish them to ghcr.io and enable anyone with write access to this repo to push those images with the procedure below.

  1. You must be a Dask organization admin

  2. You must create a personal access token (PAT) for your account with write:packages permissions at https://github.com/settings/tokens/new.

  3. Login to the ghcr.io container registry with the PAT:

    docker login ghcr.io -u your-username
    
  4. Push a dummy image with a label to couple it with a github repo and enable us to configure permissions

    echo "FROM alpine" > Dockerfile
    docker build . --tag ghcr.io/dask/dask-gateway-ci-base   --label org.opencontainers.image.source=https://github.com/dask-gateway
    docker build . --tag ghcr.io/dask/dask-gateway-ci-hadoop --label org.opencontainers.image.source=https://github.com/dask-gateway
    docker build . --tag ghcr.io/dask/dask-gateway-ci-pbs    --label org.opencontainers.image.source=https://github.com/dask-gateway
    docker build . --tag ghcr.io/dask/dask-gateway-ci-slurm  --label org.opencontainers.image.source=https://github.com/dask-gateway
    docker push ghcr.io/dask/dask-gateway-ci-base
    docker push ghcr.io/dask/dask-gateway-ci-hadoop
    docker push ghcr.io/dask/dask-gateway-ci-pbs
    docker push ghcr.io/dask/dask-gateway-ci-slurm
    
  5. Declare permissions/access for each container package
    For each container package: a) grant permissions to github actions to push to it, b) declare that access should be inherited, c) declare that it should be public. All settings should be found in the same page.

  • This has been done!

@consideRatio
Copy link
Collaborator Author

consideRatio commented Mar 27, 2022

Thanks @jacobtomlinson @martindurant and @jcrist for your review efforts and help to maintain momentum! If you find time to review some more PRs, here is a list of PRs since last week.

@consideRatio
Copy link
Collaborator Author

consideRatio commented Mar 27, 2022

Thank you so much @jcrist for your review efforts!

I'm somewhat blocked in #519 but can be unblocked by steps in #484 (comment) - that would provide image repositories that I can push to so I don't have to setup a temporary image repository to update outdated Dockerfiles used in the CI system.

@consideRatio
Copy link
Collaborator Author

Ping @jacobtomlinson, can you tackle #484 (comment) to unblock work in #519?

@jacobtomlinson
Copy link
Member

@consideRatio I have created the new container packages. One question though, do they actually need to be public? Given that they are only ever built and used in CI it doesn't seem great to include them in our package list. The secrets.GITHUB_TOKEN should have correct permission to push and pull these images in CI.

If they do need to be public you should have enough permissions yourself now to make that change.

@consideRatio
Copy link
Collaborator Author

Thank you @jacobtomlinson! I don't think they need to be public, i didnt consider that aspect - it would be good if they arent public as that may mislead someone to use them for something.

@consideRatio
Copy link
Collaborator Author

consideRatio commented Apr 2, 2022

EDIT: IGNORE THIS

It turns out that a PR can also read private ghcr.io images as long as the container package's settings declare that GitHub Actions for a specific repo should be allowed to do so.


@jacobtomlinson hmmm since they are to be accessed in CI test workflows that run when PRs are triggered, we won't have access to ${{ secrets.github_token }} to access the container registry and pull the container.

Due to that, we must have the images public after all. Sadly I fail to update the visibility of the images, it just sais in a red bar after making the change that the change failed and I should try again later. I have done that, and based on documentation etc I do think I'm allowed to update this - it just doesn't work though.

Can you try to update the images to be public?

https://github.com/orgs/dask/packages/container/dask-gateway-ci-base/settings
https://github.com/orgs/dask/packages/container/dask-gateway-ci-hadoop/settings
https://github.com/orgs/dask/packages/container/dask-gateway-ci-pbs/settings
https://github.com/orgs/dask/packages/container/dask-gateway-ci-slurm/settings

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants