Continuous Adaptation for Machine Learning System to Data Changes (#TFCommunitySpotlight Awarded)
By Chansung Park and Sayak Paul
MLOps system evolves according to the changes of the world, and that is usually caused by data/concept drift. This project shows how to combine two separate pipelines, one for batch prediction and the other for training to adapt to data changes. We worked with the TFX team to author a blog post detailing our approach. The blog post is available here: https://blog.tensorflow.org/2021/12/continuous-adaptation-for-machine.html.
We assume familiarity with basic MLOps concepts (like pipelines, data drift, batch predictions, etc.), TensorFlow, TensorFlow Extedned, and Vertex AI from the reader.
MLOps system also can be evolved when much better algorithm (i.e. state-of-the-art model) comes out. In that case, the system should apply a better algorithm to understand the existing data better. We have demonstrated such workflows in the following projects:
- Model Training as a CI/CD System Part1: Reflect changes in codebase to MLOps pipeline: Code on GitHub, Article on the GCP blog
- Model Training as a CI/CD System Part2: Trigger, schedule, and run MLOps pipelines: Code on GitHub, Article on the GCP blog
- Run the initial training pipeline to train an image classifier and deploy it using TensorFlow, TFX, and Vertex AI (
02_TFX_Training_Pipeline.ipynb
). - Download and prepare images from Bing search to simulate the data drift (
97_Prepare_Test_Images.ipynb
). - Generate batch prediction pipeline specification (JSON) (
03_Batch_Prediction_Pipeline.ipynb
). - Deploy cloud function to watch if there are enough sample data to perform batch prediction pipeline and to trigger the batch prediction pipeline (
04_Cloud_Scheduler_Trigger.ipynb
). - Schedule a periodic job to run the deployed cloud function (
04_Cloud_Scheduler_Trigger.ipynb
).
We developed several custom components in TFX for this project. You can find them under the custom_components
directory.
- Initial Data Preparation (CIFAR10)
- Build Training Pipeline
- Build Batch Prediction Pipeline
- FileListGen component
- BatchPredictionGen component
- PerformanceEvaluator component
- SpanPreparator component
- PipelineTrigger component
- Data Preparation for Data/Concept Drift Simulation (from Bing)
- Deploy Cloud Function, Schedule a Job to Trigger the Cloud Function
- End to End Test
We welcome feedback. Please create an issue to let us know what you think.
- ML-GDE program for providing GCP credits.
- Robert Crowe and Jiayi Zhao of Google for helping us with our technical doubts.