A basic repo structure to run CI/CD jobs on Outerbounds platform.
- In the Outerbounds UI, go to the
Admin
panel on the left side navigation and selectUsers
. - Under
Machines
click theCreate New
button. - Fill out the form, choosing the desired GitHub Actions form, and filling in the desired GitHub organization and repository.
- After submitting, click the row for the Machine User you created, and a code snippet will appear.
- Paste the command in actions file in
.github/workflows/
and modify it to run Metaflow code in the repository.
Our goal is to update the model used in the Predict
workflow defined in prediction_flow.py
. As a starting point for the CI/CD lifecycle, consider how a data scientist iterates. This repository demonstrates how to take the result of experimental, interactive development and use it to:
- create a GitHub branch,
- let an automatic CI/CD process built with GitHub Actions validate the model's quality (using Outerbounds platform resources), and
- if the new model code meets certain user-defined criteria, automatically deploy the newly trained model to be used in the production workflow that makes predictions accessed by other production applications.
A data scientist or ML engineer would do this rarely, and typically less frequently than the model selection/architecture in my_data_science_module.py
updates.
This only needs to be done if the code in predict_flow.py
file updates.
python predict_flow.py --production argo-workflows create
This is a way to manually trigger a refresh of the production run that populates the model prediction cache accessed by other production applications.
python predict_flow.py --production argo-workflows trigger
Local/workstation testing:
python evaluate_new_model_flow.py run
When a data scientist is satisfied with what they see on local runs, then they can use GitHub commands like a regular software development workflow:
git switch -c 'my-new-model-branch'
git add .
git commit -m 'A model I think is ready for production'
git push --set-upstream origin my-new-model-branch
After the model is pushed to the remote branch of my-new-model-branch
, the data scientist or an engineering colleague can open a pull request against the main branch. When this pull request gets merged to the main
branch of the repository, a GitHub action defined in .github/workflows/assess_new_production_model.yml
is triggered. To explore the many complex patterns like this you can implement with GitHub actions, consider step 5 of the Create and Configure your IAM Role section, and the many types of events you can use to trigger a GitHub Action.
The GitHub Action in this template will do the following:
- Run the
EvaluateNewModel
workflow defined inevaluate_new_model_flow.py
. - If the
EvaluateNewModel
workflow produces a model that meets some user-defined criteria (e.g., beyond some performance metric threshold), then tag the Metaflow run in which the model was trained as adeployment_candidate
. - If the upstream
EvaluateNewModel
run is tagged as adeployment_candidate
and the model meets any other criteria you add to this template, then the production workflow will use a new version of the model in thepredict.py
flow in an ongoing fashion.