This project focuses on developing an end-to-end multi-class Image Classification with production-ready code. It integrates MLOps practices for robust and reproducible model development, deployment, and monitoring.
- Data Source: Chest cancer images from Kaggle.
- Model: VGG16 with a transfer learning approach.
- Workflow:
- Data Ingestion: Automated data ingestion pipeline that imports images from MongoDB.
- Training: Automated model training pipeline that downloads VGG16 base-model from Tensorflow Hub and modifies it for further fine-tuning .
- Evaluation: Automated model evaluation pipeline with model tracking using MLFlow.
- Prediction: Prediction pipeline with Flask app that uses trained model for predicting user inputs.
- CI/CD and Cloud Deployment: App is deployed on AWS EC2 instance, ensuring scalability and high availability. The deployment process is automated using Docker and GitHub Actions for seamless updates and maintenance.
- VGG16 is a convolutional neural network (CNN) model trained on ImageNet dataset. The architecture consists of blocks of convolutional layers, followed by max-pooling layers for down-sampling
- Input Layer: Accepts images of size 224x224x3.
- Convolutional Layers:
- Block 1: Two 3x3 convolutions (64 filters), followed by a max-pooling layer.
- Block 2: Two 3x3 convolutions (128 filters), followed by a max-pooling layer.
- Block 3: Three 3x3 convolutions (256 filters), followed by a max-pooling layer.
- Block 4: Three 3x3 convolutions (512 filters), followed by a max-pooling layer.
- Block 5: Three 3x3 convolutions (512 filters), followed by a max-pooling layer.
- Fully Connected Layers: Flattened output is passed to two fully connected layers with 4096 units each, and a final softmax layer with 1000 outputs.
- Output Layer: Predicts probabilities for 1000 ImageNet classes.
- Imported a pre-trained version of VGG16 from Keras.
- Freeze the weights of the initial convolutional layers to preserve the learned low-level features (like edges and textures).
- Replace Fully Connected Layers
- Re-train the model on the new dataset.
- Use techniques like data augmentation and early stopping to improve model generalization and avoid overfitting.
-
MLflow: Used for tracking experiments and hyperparameter tuning.
-
DAGsHub: Visualized data pipeline for streamlined data processing and management.
-
DVC (Data Version Control): Implemented for data versioning to ensure reproducibility.
-
Flask Application: Built a web app to provide an interface for users to upload images and receive predictions.
-
Docker: Dockerized the application for consistent deployment across environments.
-
GitHub Actions: Implemented CI/CD pipelines for continuous integration and deployment.
-
AWS EC2: Deployed the application on AWS for scalable access.
- Successfully implemented the project by integrating Transfer Learning and MLOps techniques.
- Obtained model accuracy of ~95% and loss of ~0.06
To get started with this project, follow these steps:
- Clone the repository:
git clone https://github.com/SathvikNayak123/cancer-dl.git
- Install the necessary dependencies:
pip install -r requirements.txt
- Initialized DVC:
dvc init
- Training:
dvc repro
- Run app
python app.py
For details about project and workflow and commands for AWS deployment refer below,
-
Create GitHub repo with
.gitignore
. -
Create environment using conda
-
Set up
setup.py
if needed. -
Install
requirements.txt
. -
Run
template.py
. -
Update
config/config.yaml
,params.yaml
for hardcoded artifact paths and model parameters. -
Update
utils/common.py
for various utility functions like save_json, read_yaml etc. -
Update
constants/__init__.py
to initialize config.yaml and params.yaml into pipeline.
Repeat for every stage(e.g data ingestion , training, evaluation, etc):
- Update
src/config/artifacts_entity.py
. - Update
src/configconfig_entity
. - Update
src/components
. - Update
src/pipeline
. - Update
sec/pipeline/training_pipeline.py
. - Update
main.py
.
-
Use
python main.py
to test the training pipeline. -
Create
src/pipeline/predict_pipeline.py
. -
Create Flask app.
-
Dockerize the application.
-
Create
.github/workflows/cicd.yaml
.
AWS_ACCESS_KEY_ID=
AWS_SECRET_ACCESS_KEY=
AWS_REGION=
demo>> us-east-1AWS_ECR_LOGIN_URI=
demo>> 566373416292.dkr.ecr.ap-south-1.amazonaws.comECR_REPOSITORY_NAME=
demo>> simple-app`
- EC2 access: It is a virtual machine.
- ECR: Elastic Container Registry to save your Docker image in AWS.
- Build Docker image of the source code.
- Push your Docker image to ECR.
- Launch your EC2 instance.
- Pull your image from ECR in EC2.
- Launch your Docker image in EC2.
AmazonEC2ContainerRegistryFullAccess
AmazonEC2FullAccess
- Save the URI:
demo>> 136566696263.dkr.ecr.us-east-1.amazonaws.com/mlproject
#optional
sudo apt-get update -y
sudo apt-get upgrade
#required
curl -fsSL https://get.docker.com -o get-docker.sh
sudo sh get-docker.sh
sudo usermod -aG docker ubuntu
newgrp docker
setting >actions >runner >new self hosted runner >choose os >then run command one by one