The project is under development

The whole project draft line. The project contains 18 scikit learn models. you can check all models in src/model/models.py. And CI pipeline starts with integration with git.

Take input data and run the code with python main --sampling undersampling to use the undersampling dataset to train 18 models. If you want to train with the oversampling dataset just add oversampling python main --sampling oversampling.
Then, it automatically cleans the datasets and saves outputs as parquet files and stores them in the data folder.
The clean data go through oversampling with SMOTE and undersampling with NearMiss and then, prepare for training and testing the data.
After preprocessing the data, then model training start.
The training model results are viewed with MLflow. To check the result just simply type mlflow ui.
If the test matrices score f1 is more than 0.945, then the models are automatically saved with run id in mlartifacts folder.
Then the save models are loaded with run_id for further staging like registering and testing stage or production deployment stage.

Further more adding.

want to store data train and testing plots to mlflow database.
want to upload to the website and build the website.
Write unit testing and automatic check for code quality.
The training process is checked with the prefect.
The CD pipeline start with configuration with GitHub action and

retesting the model with more data.
monitoring the model with EvidentlyAI and Grafana with dockerfor model degeneration, data degeneration, data drifting and furthermore.
Then the whole process is updated to the server or docker for further use age.

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
archieve		archieve
examples		examples
src		src
.gitignore		.gitignore
MLproject		MLproject
README.md		README.md
conda.yaml		conda.yaml
main.py		main.py
model_params.yaml		model_params.yaml
python_env.yaml		python_env.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

The project is under development

About

Releases

Packages

Languages

ThomasHeinThura/Testing-automation-ETL-pipeline

Folders and files

Latest commit

History

Repository files navigation

The project is under development

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages