Skip to content

Commit

Permalink
Refactor wheel upload job to a separate job running on GH ephemeral r…
Browse files Browse the repository at this point in the history
…unner (#4877)

To run the upload part in a separate upload job on GH ephemeral runners,
we need:

1. Specific artifact name for each binary, so the upload job could find
the correct one.
2. Create a new GHA `setup-binary-upload` to:
    1. Download the artifacts from GitHub 
2. Running `pkg-helpers` is needed to figure out the correct S3 bucket
and path to upload to.
3. Create a new GHA reusable workflow `_binary_upload` to upload the
artifacts to S3.
    1. Run on GH ephemeral runner `ubuntu-22.04`.
2. Only this job has access to the credential, the build job doesn't
have that privilege anymore.

A small caveat here is that the upload job will depend on the build job
with all its configuration matrix, so it can only be run after all build
configurations finish successfully, not when individual builds finish.

The PR is quite big, so I will do a similar follow up for conda build
after this using the same `_binary_upload` reusable workflow.
  • Loading branch information
huydhn authored Jan 15, 2024
1 parent 5a8239d commit 8acbaa9
Show file tree
Hide file tree
Showing 6 changed files with 226 additions and 109 deletions.
26 changes: 20 additions & 6 deletions .github/actions/setup-binary-builds/action.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,9 @@ inputs:
description: If set to any value, don't use sudo to clean the workspace
required: false
type: string
default: ""
default: ''
ref:
description: Works as stated in actions/checkout, but the default value is recursive
description: Works as stated in actions/checkout
required: false
type: string
default: nightly
Expand All @@ -19,15 +19,27 @@ inputs:
type: string
default: recursive
setup-miniconda:
description: Works as stated in actions/checkout, but the default value is recursive
description: Set to true if setup-miniconda is needed
required: false
type: boolean
default: false
python-version:
description: Works as stated in actions/checkout, but the default value is recursive
description: The target Python version
required: true
type: string
cuda-version:
description: The target CUDA version
required: true
type: string
arch:
description: The target ARCH
required: true
type: string
upload-to-base-bucket:
description: One of the parameter used by pkg-helpers
required: false
type: boolean
default: false
default: no

runs:
using: composite
Expand Down Expand Up @@ -62,11 +74,13 @@ runs:
shell: bash
env:
PYTHON_VERSION: ${{ inputs.python-version }}
CU_VERSION: ${{ inputs.cuda-version }}
ARCH: ${{ inputs.arch }}
run: |
set -euxo pipefail
# Set artifact name here since github actions doesn't have string manipulation tools
# and "/" is not allowed in artifact names
echo "ARTIFACT_NAME=${REPOSITORY/\//_}_${REF}_${PYTHON_VERSION}" >> "${GITHUB_ENV}"
echo "ARTIFACT_NAME=${REPOSITORY/\//_}_${REF}_${PYTHON_VERSION}_${CU_VERSION}_${ARCH}" >> "${GITHUB_ENV}"
- name: Setup miniconda (for pytorch_pkg_helpers)
if: ${{ inputs.setup-miniconda == 'true' }}
uses: conda-incubator/[email protected]
Expand Down
73 changes: 73 additions & 0 deletions .github/actions/setup-binary-upload/action.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
name: Set up binary upload jobs

description: Setup a GitHub ephemeral runner to upload binary wheel and conda artifacts

inputs:
repository:
description: The repository name, i.e. pytorch/vision
required: true
type: string
ref:
description: Part of the artifact name
required: false
type: string
default: ''
python-version:
description: Part of the artifact name
required: true
type: string
cuda-version:
description: Part of the artifact name
required: true
type: string
arch:
description: Part of the artifact name
required: true
type: string
upload-to-base-bucket:
description: One of the parameter used by pkg-helpers
required: false
type: boolean
default: no

runs:
using: composite
steps:
- uses: actions/setup-python@v4
with:
python-version: '3.11'
cache: pip

- name: Set the artifact name
shell: bash
env:
REPOSITORY: ${{ inputs.repository }}
REF: ${{ inputs.ref }}
PYTHON_VERSION: ${{ inputs.python-version }}
CU_VERSION: ${{ inputs.cuda-version }}
ARCH: ${{ inputs.arch }}
run: |
set -ex
# Set artifact name here since github actions doesn't have string manipulation tools
# and "/" is not allowed in artifact names
echo "ARTIFACT_NAME=${REPOSITORY/\//_}_${REF}_${PYTHON_VERSION}_${CU_VERSION}_${ARCH}" >> "${GITHUB_ENV}"
- name: Generate env variables from pytorch_pkg_helpers
shell: bash
env:
REPOSITORY: ${{ inputs.repository }}
REF: ${{ inputs.ref }}
PYTHON_VERSION: ${{ inputs.python-version }}
CU_VERSION: ${{ inputs.cuda-version }}
ARCH: ${{ inputs.arch }}
run: |
set -ex
python -m pip install tools/pkg-helpers
BUILD_ENV_FILE="${RUNNER_TEMP}/build_env_${GITHUB_RUN_ID}"
python -m pytorch_pkg_helpers > "${BUILD_ENV_FILE}"
cat "${BUILD_ENV_FILE}"
echo "BUILD_ENV_FILE=${BUILD_ENV_FILE}" >> "${GITHUB_ENV}"
97 changes: 97 additions & 0 deletions .github/workflows/_binary_upload.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
name: upload

on:
workflow_call:
inputs:
repository:
description: 'Repository to checkout, defaults to ""'
default: ''
type: string
ref:
description: 'Reference to checkout, defaults to "nightly"'
default: 'nightly'
type: string
build-matrix:
description: "Build matrix to utilize"
default: ''
type: string
architecture:
description: Architecture to build for x86_64 for default Linux, or aarch64 for Linux aarch64 builds
required: false
type: string
default: ''
trigger-event:
description: "Trigger Event in caller that determines whether or not to upload"
type: string
default: ''

jobs:
upload:
runs-on: ubuntu-22.04
environment: ${{(inputs.trigger-event == 'push' && (startsWith(github.event.ref, 'refs/heads/nightly') || startsWith(github.event.ref, 'refs/tags/v'))) && 'pytorchbot-env' || ''}}
strategy:
fail-fast: false
matrix: ${{ fromJSON(inputs.build-matrix) }}
timeout-minutes: 30
name: ${{ matrix.build_name }}
steps:
- uses: actions/checkout@v3

# For pytorch_pkg_helpers which we need to run to generate the artifact name and target S3 buckets
- uses: ./.github/actions/setup-binary-upload
with:
repository: ${{ inputs.repository }}
ref: ${{ inputs.ref }}
python-version: ${{ matrix.python_version }}
cuda-version: ${{ matrix.desired_cuda }}
arch: ${{ inputs.architecture }}
upload-to-base-bucket: ${{ matrix.upload_to_base_bucket }}

- uses: ./.github/actions/set-channel

- name: Download the artifact
uses: actions/download-artifact@v3
with:
name: ${{ env.ARTIFACT_NAME }}
path: ${{ inputs.repository }}/dist/

- name: Configure aws credentials (pytorch account)
if: ${{ inputs.trigger-event == 'push' && startsWith(github.event.ref, 'refs/heads/nightly') }}
uses: aws-actions/configure-aws-credentials@v3
with:
role-to-assume: arn:aws:iam::749337293305:role/gha_workflow_nightly_build_wheels
aws-region: us-east-1

- name: Configure aws credentials (pytorch account)
if: ${{ env.CHANNEL == 'test' && startsWith(github.event.ref, 'refs/tags/v') }}
uses: aws-actions/configure-aws-credentials@v3
with:
role-to-assume: arn:aws:iam::749337293305:role/gha_workflow_test_build_wheels
aws-region: us-east-1

- name: Nightly or release RC
if: ${{ (inputs.trigger-event == 'push' && startsWith(github.event.ref, 'refs/heads/nightly')) || (env.CHANNEL == 'test' && startsWith(github.event.ref, 'refs/tags/')) }}
shell: bash
run: |
set -ex
echo "NIGHTLY_OR_TEST=1" >> "${GITHUB_ENV}"
- name: Upload package to pytorch.org
shell: bash
working-directory: ${{ inputs.repository }}
run: |
set -ex
# shellcheck disable=SC1090
source "${BUILD_ENV_FILE}"
pip install awscli==1.32.18
AWS_CMD="aws s3 cp --dryrun"
if [[ "${NIGHTLY_OR_TEST:-0}" == "1" ]]; then
AWS_CMD="aws s3 cp"
fi
for pkg in dist/*; do
${AWS_CMD} "$pkg" "${PYTORCH_S3_BUCKET_PATH}" --acl public-read
done
46 changes: 12 additions & 34 deletions .github/workflows/build_wheels_linux.yml
Original file line number Diff line number Diff line change
Expand Up @@ -65,14 +65,6 @@ on:
required: false
type: boolean
default: true
# TODO (huydhn): Remove them once all libraries using Nova has removed them
secrets:
AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID:
description: "AWS Access Key passed from caller workflow"
required: false
AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY:
description: "AWS Secret Access Ket passed from caller workflow"
required: false

permissions:
id-token: write
Expand All @@ -93,7 +85,6 @@ jobs:
ARCH: ${{ inputs.architecture }}
name: ${{ matrix.build_name }}
runs-on: ${{ matrix.validation_runner }}
environment: ${{(inputs.trigger-event == 'push' || startsWith(github.event.ref, 'refs/tags/')) && 'pytorchbot-env' || ''}}
container:
image: ${{ matrix.container_image }}
options: ${{ matrix.gpu_arch_type == 'cuda' && '--gpus all' || ' ' }}
Expand Down Expand Up @@ -153,6 +144,8 @@ jobs:
ref: ${{ inputs.ref }}
setup-miniconda: ${{ inputs.setup-miniconda }}
python-version: ${{ env.PYTHON_VERSION }}
cuda-version: ${{ env.CU_VERSION }}
arch: ${{ env.ARCH }}
- name: Combine Env Var and Build Env Files
if: ${{ inputs.env-var-script != '' }}
working-directory: ${{ inputs.repository }}
Expand Down Expand Up @@ -235,31 +228,16 @@ jobs:
echo "${{ inputs.repository }}/${SMOKE_TEST_SCRIPT} found"
${CONDA_RUN} python "${{ inputs.repository }}/${SMOKE_TEST_SCRIPT}"
fi
# TODO (huydhn): Move the following step to a separate build job
- name: Configure aws credentials (pytorch account)
if: ${{ inputs.trigger-event == 'push' && startsWith(github.event.ref, 'refs/heads/nightly') }}
uses: aws-actions/configure-aws-credentials@v3
with:
role-to-assume: arn:aws:iam::749337293305:role/gha_workflow_nightly_build_wheels
aws-region: us-east-1
- name: Configure aws credentials (pytorch account)
if: ${{ env.CHANNEL == 'test' && startsWith(github.event.ref, 'refs/tags/') }}
uses: aws-actions/configure-aws-credentials@v3
with:
role-to-assume: arn:aws:iam::749337293305:role/gha_workflow_test_build_wheels
aws-region: us-east-1
- name: Upload package to pytorch.org
if: ${{ (inputs.trigger-event == 'push' && startsWith(github.event.ref, 'refs/heads/nightly')) || (env.CHANNEL == 'test' && startsWith(github.event.ref, 'refs/tags/')) }}
shell: bash -l {0}
working-directory: ${{ inputs.repository }}
run: |
set -euxo pipefail
source "${BUILD_ENV_FILE}"
${CONDA_RUN} pip install awscli
for pkg in dist/*; do
# PYTORCH_S3_BUCKET_PATH derived from pkg-helpers
${CONDA_RUN} aws s3 cp "$pkg" "${PYTORCH_S3_BUCKET_PATH}" --acl public-read
done
upload:
needs: build
uses: ./.github/workflows/_binary_upload.yml
with:
repository: ${{ inputs.repository }}
ref: ${{ inputs.ref }}
build-matrix: ${{ inputs.build-matrix }}
architecture: ${{ inputs.architecture }}
trigger-event: ${{ inputs.trigger-event }}

concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.sha }}-${{ inputs.repository }}-${{ github.event_name == 'workflow_dispatch' }}
Expand Down
49 changes: 13 additions & 36 deletions .github/workflows/build_wheels_macos.yml
Original file line number Diff line number Diff line change
Expand Up @@ -59,14 +59,6 @@ on:
description: "The key created when saving a cache and the key used to search for a cache."
default: ""
type: string
# TODO (huydhn): Remove them once all libraries using Nova has removed them
secrets:
AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID:
description: "AWS Access Key passed from caller workflow"
required: false
AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY:
description: "AWS Secret Access Ket passed from caller workflow"
required: false

permissions:
id-token: write
Expand All @@ -82,9 +74,9 @@ jobs:
PACKAGE_TYPE: wheel
REPOSITORY: ${{ inputs.repository }}
REF: ${{ inputs.ref }}
CU_VERSION: ${{ matrix.desired_cuda }}
name: ${{ matrix.build_name }}
runs-on: ${{ inputs.runner-type }}
environment: ${{(inputs.trigger-event == 'push' || startsWith(github.event.ref, 'refs/tags/')) && 'pytorchbot-env' || ''}}
# If a build is taking longer than 60 minutes on these runners we need
# to have a conversation
timeout-minutes: 60
Expand Down Expand Up @@ -115,6 +107,8 @@ jobs:
ref: ${{ inputs.ref }}
setup-miniconda: false
python-version: ${{ env.PYTHON_VERSION }}
cuda-version: ${{ env.CU_VERSION }}
arch: ${{ env.ARCH }}
- name: Combine Env Var and Build Env Files
if: ${{ inputs.env-var-script != '' }}
working-directory: ${{ inputs.repository }}
Expand All @@ -123,7 +117,7 @@ jobs:
- name: Install delocate-wheel
run: |
set -euxo pipefail
${CONDA_RUN} python3 -m pip install delocate
${CONDA_RUN} python3 -m pip install delocate==0.10.7
- name: Install torch dependency
run: |
set -euxo pipefail
Expand Down Expand Up @@ -209,37 +203,20 @@ jobs:
${CONDA_RUN} python3 "${{ inputs.repository }}/${SMOKE_TEST_SCRIPT}"
fi
export PATH=${OLD_PATH}
# TODO (huydhn): Move the following step to a separate build job
- name: Configure aws credentials (pytorch account)
if: ${{ inputs.trigger-event == 'push' && startsWith(github.event.ref, 'refs/heads/nightly') }}
uses: aws-actions/configure-aws-credentials@v3
with:
role-to-assume: arn:aws:iam::749337293305:role/gha_workflow_nightly_build_wheels
aws-region: us-east-1
- name: Configure aws credentials (pytorch account)
if: ${{ env.CHANNEL == 'test' && startsWith(github.event.ref, 'refs/tags/') }}
uses: aws-actions/configure-aws-credentials@v3
with:
role-to-assume: arn:aws:iam::749337293305:role/gha_workflow_test_build_wheels
aws-region: us-east-1
- name: Upload package to pytorch.org
if: ${{ (inputs.trigger-event == 'push' && startsWith(github.event.ref, 'refs/heads/nightly')) || (env.CHANNEL == 'test' && startsWith(github.event.ref, 'refs/tags/')) }}
shell: bash -l {0}
working-directory: ${{ inputs.repository }}
run: |
set -euxo pipefail
# shellcheck disable=SC1090
source "${BUILD_ENV_FILE}"
${CONDA_RUN} pip install awscli
for pkg in dist/*; do
# PYTORCH_S3_BUCKET_PATH derived from pkg-helpers
${CONDA_RUN} aws s3 cp "$pkg" "${PYTORCH_S3_BUCKET_PATH}" --acl public-read
done
- name: Clean up disk space
if: always()
continue-on-error: true
uses: ./test-infra/.github/actions/check-disk-space

upload:
needs: build
uses: ./.github/workflows/_binary_upload.yml
with:
repository: ${{ inputs.repository }}
ref: ${{ inputs.ref }}
build-matrix: ${{ inputs.build-matrix }}
trigger-event: ${{ inputs.trigger-event }}

concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.sha }}-${{ inputs.repository }}-${{ github.event_name == 'workflow_dispatch' }}
cancel-in-progress: true
Loading

0 comments on commit 8acbaa9

Please sign in to comment.