Coupon Acceptance Prediction Pipeline

Project Overview

This MLOps project predicts coupon acceptance in vehicles using the XGBoost model. It implements a complete machine learning pipeline, from data preprocessing to model deployment, with a focus on MLOps best practices.

Architecture

Data Source

The project uses the "In-Vehicle Coupon Recommendation" dataset from the UCI Machine Learning Repository, providing rich information about users, merchants, and coupon characteristics.

Key Features

Destination (No Urgent Place, Home, Work)
Weather Conditions (Sunny, Rainy, Snowy)
Time of Day (7AM, 10AM, 2PM, 6PM, 10PM)
Coupon Categories (Restaurant(<$20), Restaurant($20-$50), Coffee House, Bar, Carry out & Take away)
Expiration Time (2 hours, 1 day)
Direction Alignment (Same direction or not)

Technology Stack

Python 3.9
XGBoost for model training
MLflow 2.4 for experiment tracking and model registry
Prefect 2.11.2 for workflow orchestration
Docker for containerization
AWS (EC2, S3, RDS) for cloud infrastructure
Terraform for infrastructure as code
Flask for model serving
Grafana for monitoring

Project Structure

.
├── .github/workflows/      # CI/CD pipelines
├── .prefect/               # Prefect configurations
├── code/
│   ├── infrastructure/     # Terraform files for AWS setup
│   ├── config/             # Grafana configuration
│   ├── tests/              # Unit tests
│   ├── model.py            # Main model training script
│   ├── model_deployment.py # Model deployment script
│   ├── simu_web_service.py # Web service simulation
│   ├── Dockerfile          # Docker configuration for model serving
│   └── docker-compose.yml  # Docker Compose for multi-container setup
├── .pre-commit-config.yaml # Pre-commit hook configurations
└── prefect.yaml            # Prefect project configuration

Core Components and Workflow

Data Preparation: Load and preprocess the coupon recommendation data.
Feature Engineering: Prepare features for model training.
Model Training: Train XGBoost model with hyperparameter tuning.
Experiment Tracking: Use MLflow to log parameters, metrics, and artifacts.
Workflow Orchestration: Manage the pipeline with Prefect.
Model Deployment: Serve the model using Flask and Docker.
Monitoring: Visualize model performance and system metrics with Grafana.

Setup and Usage

For detailed setup instructions, please refer to the code/README.md file. It covers:

Environment setup and dependency installation
AWS resource provisioning with Terraform
MLflow tracking server setup
Prefect workflow configuration
Model training and hyperparameter tuning
Model deployment using Docker
Making predictions with the deployed service

Development Flow and CI/CD Process

Local Development:
- Developers work on feature branches
- Use pre-commit hooks for code quality checks
- Run unit tests locally
Code Push and CI:
- When code is pushed to GitHub, CI pipeline is triggered
- GitHub Actions run automated tests and linting
- If tests pass, the code is approved for review
Code Review and Merge:
- Pull requests are created for feature branches
- Code is reviewed by team members
- Once approved, code is merged into the main branch
Continuous Deployment:
- Merges to main branch trigger the CD pipeline
- CD pipeline builds the Docker image
- New image is pushed to container registry
- Terraform applies any infrastructure changes
- New version of the application is deployed to staging environment
Model Training and Experimentation:
- Data scientists use Prefect to orchestrate model training workflows
- Experiments are tracked using MLflow
- Best performing models are registered in the MLflow Model Registry
Production Deployment:
- After validation in staging, manual approval triggers production deployment
- Latest model is pulled from MLflow Model Registry
- Docker image is updated with new model
- Production environment is updated with new image
Monitoring and Feedback:
- Grafana dashboards monitor model performance and system health
- Feedback loop allows for continuous improvement of the model

Model Re-training Flow

The coupon acceptance prediction pipeline in this project is implemented using a combination of MLOps tools and practices. Here's a breakdown of how the pipeline works:

Data Ingestion and Preprocessing:
- The read_data function loads the coupon recommendation dataset from a ZIP file.
- It renames some columns and selects relevant features.
Data Splitting and Feature Engineering:
- The prepare_data_valid_set function splits the data into training, validation, and test sets.
- It uses DictVectorizer to convert categorical features into a format suitable for machine learning.
Model Training:
- The train_best_model function trains an XGBoost model using the prepared data.
- It uses MLflow to log parameters, metrics, and artifacts during the training process.
Hyperparameter Tuning:
- The main flow allows for hyperparameter tuning by accepting parameters like learning rate, max depth, etc.
- These parameters can be adjusted via command-line arguments.
Model Evaluation:
- The pipeline calculates the AUC score on the validation set to evaluate model performance.
Experiment Tracking:
- MLflow is used throughout the pipeline to track experiments, including hyperparameters, metrics, and model artifacts.
Workflow Orchestration:
- The entire pipeline is orchestrated using Prefect, with different steps defined as tasks and the overall flow defined in main_flow.
Model Persistence:
- The trained model and preprocessor are saved using MLflow, making them easy to load for deployment.
Reporting:
- A markdown report is generated with the AUC score, which can be used for easy visualization of results.
Flexibility for Testing:
- The pipeline includes an option to prepare and save test data for final model evaluation.

This pipeline ensures a streamlined process from data ingestion to model training and evaluation, with built-in experiment tracking and workflow management. It's designed to be reproducible and easily adjustable for different hyperparameters or datasets.

This pipeline uses a combination of techniques to handle these drifts:

Continuous Monitoring: MLflow and Prefect are used to continuously track model performance and data statistics.
Automated Retraining: When drift is detected, the pipeline can automatically trigger model retraining using the latest data.
Versioning: All models and datasets are versioned, allowing for easy rollback if needed.
Alerting: The system generates alerts when significant drift is detected, allowing for human intervention when necessary.
Adaptive Feature Engineering: The feature engineering process can be adjusted based on detected drifts.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.github/workflows		.github/workflows
.prefect		.prefect
.vscode		.vscode
code		code
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.prefectignore		.prefectignore
CopounArch.png		CopounArch.png
README.md		README.md
deployment.yaml		deployment.yaml
prefect.yaml		prefect.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Coupon Acceptance Prediction Pipeline

Project Overview

Architecture

Data Source

Key Features

Technology Stack

Project Structure

Core Components and Workflow

Setup and Usage

Development Flow and CI/CD Process

Model Re-training Flow

About

Releases

Packages

Languages

bhanuteja2001/coupon-prediction

Folders and files

Latest commit

History

Repository files navigation

Coupon Acceptance Prediction Pipeline

Project Overview

Architecture

Data Source

Key Features

Technology Stack

Project Structure

Core Components and Workflow

Setup and Usage

Development Flow and CI/CD Process

Model Re-training Flow

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages