-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update the deploy.yaml workflow to run the model on CI #26
Changes from all commits
c565cd1
b93c3a6
d74fb4b
efcf7e6
92c2b1e
644e916
afa0c8d
7223f4c
4e3ab96
4699510
35b15ed
c41c3a7
3edd026
62c144a
d1e9ff7
9f93b29
4be0bbd
b851aa6
381b7e2
55ee7fe
04f2246
6573c92
57d1c8d
0dea848
d1df1f1
7e2f44f
076bd9e
c6e0c9c
09defd0
4cb0461
3753c46
867b562
bb9b55b
f086c3e
c0c6bf9
25e2d0d
c25920a
e66522c
836db64
1a79c45
b758986
f11ec9b
5cbeb15
ee91e3a
e324016
a54355c
9ec240a
78493a1
8377dde
a1db4ea
6069af2
ec6e9fb
e73bc6e
44beedb
302ce0b
6602dd7
202f941
146b3b9
ca7a306
2a85c14
45bc5e2
f4384f9
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,87 @@ | ||
name: Setup Terraform | ||
description: Install and configure Terraform and AWS for the correct environment | ||
inputs: | ||
role-to-assume: | ||
description: AWS IAM role to assume when running Terraform operations. | ||
required: true | ||
aws-account-id: | ||
description: AWS account ID to use to create resources. | ||
required: true | ||
batch-container-image-name: | ||
description: The name of the container image to use for the Batch job. | ||
required: true | ||
jeancochrane marked this conversation as resolved.
Show resolved
Hide resolved
|
||
role-duration-seconds: | ||
description: How long the role specified by role-to-assume should be valid. | ||
required: false | ||
default: 3600 | ||
tfvars-file: | ||
description: File to store Terraform variables. | ||
required: false | ||
default: terraform.tfvars | ||
runs: | ||
using: composite | ||
steps: | ||
- name: Mask sensitive AWS IDs from Terraform logs | ||
run: | | ||
echo "::add-mask::${{ inputs.role-to-assume }}" | ||
echo "::add-mask::${{ inputs.aws-account-id }}" | ||
shell: bash | ||
|
||
- name: Configure AWS credentials | ||
uses: aws-actions/configure-aws-credentials@v4 | ||
with: | ||
role-to-assume: ${{ inputs.role-to-assume }} | ||
aws-region: us-east-1 | ||
role-duration-seconds: ${{ inputs.role-duration-seconds }} | ||
|
||
- name: Set up Terraform | ||
uses: hashicorp/setup-terraform@v2 | ||
|
||
- name: Initialize Terraform | ||
run: terraform init | ||
shell: bash | ||
working-directory: terraform | ||
|
||
- name: Set Terraform variables | ||
id: set-vars | ||
run: | | ||
# GITHUB_HEAD_REF is only set on pull_request events, so if it's | ||
# present, we must be in a PR context | ||
if [ -n "$GITHUB_HEAD_REF" ]; then | ||
echo "On pull request branch, setting terraform workspace to CI" | ||
# Replace slashes and underscores with hyphens in the workspace name | ||
# and force it to lowercase, since we use it to name resources and | ||
# we want to follow a consistent naming scheme | ||
WORKSPACE="$(echo $GITHUB_HEAD_REF | \ | ||
sed -e 's/\//-/g' -e 's/_/-/g' | \ | ||
tr '[:upper:]' '[:lower:]')" | ||
BATCH_JOB_NAME="ci_${WORKSPACE}_${GITHUB_REPOSITORY//\//-}" | ||
|
||
elif [[ $GITHUB_REF_NAME == 'master' ]]; then | ||
echo "On master branch, setting terraform workspace to prod" | ||
WORKSPACE="prod" | ||
BATCH_JOB_NAME="${GITHUB_REPOSITORY//\//-}" | ||
|
||
else | ||
echo "CI context did not match any of the expected environments" | ||
exit 1 | ||
fi | ||
|
||
{ | ||
echo "batch_job_name = \"$BATCH_JOB_NAME\""; | ||
echo "batch_container_image_name = \"$BATCH_CONTAINER_IMAGE_NAME\""; | ||
} > "$TFVARS_FILE" | ||
|
||
echo "workspace=$WORKSPACE" >> "$GITHUB_OUTPUT" | ||
shell: bash | ||
working-directory: terraform | ||
env: | ||
TFVARS_FILE: ${{ inputs.tfvars-file }} | ||
BATCH_CONTAINER_IMAGE_NAME: ${{ inputs.batch-container-image-name }} | ||
|
||
- name: Select Terraform workspace | ||
run: terraform workspace select -or-create "$WORKSPACE" | ||
shell: bash | ||
working-directory: terraform | ||
env: | ||
WORKSPACE: ${{ steps.set-vars.outputs.workspace }} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,119 @@ | ||
#!/usr/bin/env bash | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Since this logic is shared between the steps that check for job startup and job completion, I factored it out into a shared script. |
||
# Poll an AWS Batch job to check its status. | ||
# | ||
# Takes two positional arguments: | ||
# | ||
# 1. (required) The ID of the Batch job to poll | ||
# 2. (optional) An enum indicating which type of poll to run: `startup` or | ||
# `completion`. Defaults to `completion`. If the value of the argument is | ||
# `startup`, the script will treat the `RUNNING` status as a terminal | ||
# success status. Otherwise, only `SUCCESS` will count as a terminal | ||
# success status, and a status of `RUNNING` will cause the script to | ||
# continue polling. | ||
# | ||
# Example usage: | ||
# | ||
# ./.github/scripts/poll_batch_job_status.sh 12345 startup | ||
set -euo pipefail | ||
|
||
BATCH_JOB_LOG_URL_PREFIX="https://us-east-1.console.aws.amazon.com/cloudwatch/home?region=us-east-1#logsV2:log-groups/log-group/%2Faws%2Fbatch%2Fjob/log-events" | ||
# How many times to poll AWS Batch job status while it's starting up before | ||
# deciding to raise an error. Multiply by BATCH_JOB_POLL_INTERVAL_SECONDS to | ||
# derive a timeout in second units. There is no equivalent timeout for running | ||
# jobs, because those timeouts can be set on the Batch level, whereas startup | ||
# timeouts are not controllable by Batch | ||
BATCH_JOB_POLL_STARTUP_MAX_RETRIES=60 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It could make sense to eventually make these attributes configurable, particularly once we factor this logic out into a shared action or workflow, but for the sake of faster iteration I decided to hardcode them for now since bash argument parsing is so annoying. |
||
|
||
if [ -z ${1+x} ]; then | ||
echo "Missing Batch job ID" | ||
exit 1 | ||
fi | ||
|
||
BATCH_JOB_ID="$1" | ||
|
||
POLL_TYPE="completion" | ||
# How long to wait between queries when polling | ||
BATCH_JOB_POLL_INTERVAL_SECONDS=300 # 5 minutes | ||
|
||
if [[ -n ${2+x} ]]; then | ||
case "$2" in | ||
"startup") | ||
POLL_TYPE="startup" | ||
BATCH_JOB_POLL_INTERVAL_SECONDS=60 | ||
;; | ||
|
||
"completion") | ||
;; | ||
|
||
*) | ||
echo "Positional argument must be one of 'startup' or 'completion', " | ||
echo "got: '$2'" | ||
exit 1 | ||
;; | ||
esac | ||
fi | ||
|
||
echo "Polling for status of Batch job $BATCH_JOB_ID, waiting for $POLL_TYPE" | ||
|
||
|
||
LOOP_COUNTER=0 | ||
while true; do | ||
echo "Getting status of job $BATCH_JOB_ID" | ||
JOB_DESCRIPTIONS=$(aws batch describe-jobs --jobs "$BATCH_JOB_ID") | ||
|
||
JOB_LIST=$(echo "$JOB_DESCRIPTIONS" | jq -r '.jobs') | ||
if [[ "$JOB_LIST" == "[]" ]]; then | ||
echo "Unexpected empty response from aws batch describe-jobs" | ||
exit 1 | ||
fi | ||
|
||
JOB_STATUS=$(echo "$JOB_DESCRIPTIONS" | jq -r '.jobs[0].status') | ||
echo "Job status is $JOB_STATUS" | ||
|
||
JOB_LOG_STREAM_NAME=$(\ | ||
echo "$JOB_DESCRIPTIONS" | \ | ||
jq -r '.jobs[0].container.logStreamName' \ | ||
) | ||
# Any slashes in the log stream name need to be urlencoded | ||
JOB_LOG_URL="${BATCH_JOB_LOG_URL_PREFIX}/${JOB_LOG_STREAM_NAME//\//%2F}" | ||
|
||
case "$JOB_STATUS" in | ||
"RUNNING") | ||
if [[ "$POLL_TYPE" == "startup" ]]; then | ||
echo "Job has started! See logs: $JOB_LOG_URL" | ||
exit 0 | ||
fi | ||
;; | ||
|
||
"SUCCEEDED") | ||
echo "Job succeeded!" | ||
exit 0 | ||
;; | ||
|
||
"FAILED") | ||
echo "Job failed :( See logs: $JOB_LOG_URL" | ||
echo "More logs and container metrics can also be found on the " | ||
echo "job detail page in the AWS Batch console" | ||
exit 1 | ||
;; | ||
|
||
*) | ||
if [[ "$LOOP_COUNTER" == "$BATCH_JOB_POLL_STARTUP_MAX_RETRIES" ]]; then | ||
echo "Failing workflow due to job startup timeout. This means " | ||
echo "that the job did not enter a RUNNING state within a " | ||
echo "reasonable amount of time. This usually indicates a " | ||
echo "problem in the underlying ECS or EC2 backend that can " | ||
echo "be debugged by checking cluster/instance logs in the " | ||
echo "AWS console." | ||
exit 1 | ||
fi | ||
;; | ||
esac | ||
|
||
echo "Sleeping ${BATCH_JOB_POLL_INTERVAL_SECONDS}s until next status check" | ||
sleep "$BATCH_JOB_POLL_INTERVAL_SECONDS" | ||
|
||
LOOP_COUNTER=$((LOOP_COUNTER + 1)) | ||
echo "Starting status check #$LOOP_COUNTER" | ||
|
||
done |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,33 @@ | ||
# Workflow that runs whenever a PR is closed or merged and deletes any | ||
# AWS resources created by the `deploy` workflow. | ||
name: cleanup-terraform | ||
jeancochrane marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
on: | ||
pull_request: | ||
types: [closed] | ||
dfsnow marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
jobs: | ||
cleanup-terraform: | ||
runs-on: ubuntu-latest | ||
# These permissions are needed to interact with GitHub's OIDC Token endpoint | ||
# so that we can authenticate with AWS | ||
permissions: | ||
id-token: write | ||
contents: read | ||
steps: | ||
- name: Checkout | ||
uses: actions/checkout@v4 | ||
|
||
- name: Setup Terraform | ||
uses: ./.github/actions/setup-terraform | ||
with: | ||
role-to-assume: ${{ secrets.AWS_IAM_ROLE_TO_ASSUME_ARN }} | ||
aws-account-id: ${{ secrets.AWS_ACCOUNT_ID }} | ||
# This value can be anything, since Terraform doesn't need correct | ||
# values for variables in order to destroy resources | ||
batch-container-image-name: foobar | ||
|
||
- name: Delete resources using Terraform | ||
run: terraform destroy -auto-approve | ||
working-directory: terraform | ||
shell: bash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that this action has a lot of logic in common with the equivalent action in
model-sales-val
. When I copy the workflows from this repo over to the condo model in ccao-data/model-condo-avm#5 I may decide to factor this logic out into an action that can be reused by workflows across repos; in the meantime, however, we accept a little bit of cross-repo duplication in order to iterate more quickly.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think factoring this out would be wise, considering it's a pattern we're very likely to re-use. How about we make a
ccao-data/actions
repo, since we're starting to replicate lots of actions in general.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds reasonable! My one concern is that a monorepo would make it harder to version individual actions, since new releases would have to apply all the actions in the repo. The trade-off is that a monorepo would make it faster to iterate on actions, particularly for actions that depend on one another (e.g. I can imagine that eventually there may be a reusable
run-model
workflow that depends on asetup-terraform
action). An alternative would be to use a dedicated prefix likeaction-*
for repos that represent actions, and keep them separate for versioning purposes. I'm somewhat torn and curious which pattern you prefer!There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the pattern in r-lib/actions works pretty well. Minor versions can change the code underlying the tag and only breaking changes would warrant a major version bump. I think it makes sense to basically factor out stuff into composite actions in their own sub-directories. Curious what you think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm down to try it! Conceptually it seems kind of strange to me that a new major version would be applied to all packages in the repo even if breaking changes were only introduced in a subset of those packages, but if it's a common pattern in practice then I think it makes sense for us to follow it.