We would like to thank the pioneers of the MLOpsPython repo - We have borrowed several aspects from the repo
Goal is to teach the fundamentals of MLOps to practitioners of Machine Learning with a hands-on approach.
Why should you care? To sustain business benefits of Machine learning across your organization, we need to bring in discipline, automation & best practices. Enter MLOps.
Approach
- Minimalistic approach of an end to end MLOps pipeline: Fully CI/CD YAML based pipeline (no proprietary release pipelines in Azure devops), Gated releases (manual approvals) and full CLI based MLOps
- Focus is on clean, understandable pipeline & code - goal is to teach
- Additional scenarios will include Model explanation, Data drift etc
Technologies: We will use Azure Machine Learning & Azure Devops to showcase CI/CD pipelines for a Machine Learning project. However the concepts are valid irrespective of vendor platforms.
Get Started
- Understand what we are trying to do (below section + workshop discussion)
- Setup the environment
- Run an end to end MLOps pipeline
Note: Automated builds based on code/asset changes have been disabled by setting triggers: none
in the pipelines. The reason is to avoid triggering accidental builds during your learning phase.
The above diagram illustrates a possible end to end MLOps scenario. Our current Build-Release pipeline has a subset: Training
➡️ Approval
➡️ Model Registration
➡️ Package
➡️ Deploy in test
.
Notes on our Base scenario:
- Directory Structure
mlops_pipelines
contains the devops pipelines- The EnvCreatePipeline.yml is a devops pipeline that will provision all the components in the cloud
- The BuildReleasePipeline.yml is a devops pipeline that would perform the subset of steps mentioned above (Training to Deployment in Test)
code
directory has the source code for training and scoring. This will be used by Azure ML to create docker images to perform training & scoring.dataset
directory contains the german credit card dataset
- Training: For training we use a simple LogisticRegression model on the German Credit card dataset. We build sklearn pipeline that does festure engineering. We export the whole pipeline as a the model binary (pkl file).
- We use Azure ML CLI as a mechanism for interacting with Azure ML due to simplicity reasons.
More documentation will follow.
Acknowledgments for the German Creditcard Dataset
Dua, D. and Graff, C. (2019). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.