End of Studies projects at CentraleSupelec. The goal is to discover and test the Rapids ecosystem, focusing on the librairy cuML.
Follow the following steps:
- Install Conda
- Create working environment for Rapids using the following explainer: link. It will install all the require depencie to run cuml.
- Activate the environment
conda activate env_name
. - Install the others depencies the will be listed in for each sub-project.
You can download the data used using the following links::
- Digit Recognizer
- Detecting Malicious URLs using the svm light version.
Script to benchmark different clustering algorithms. Code adapted from here.
Test cuml kNN on real data. Code adapted from here
A script showing to most basic use of cuML Kmeans implementation.
Test of a integration of cuML in a Flask API.
Run the app with: python app.py
You can pretrained the model with the model_maker.py
script, make sure to properly set the dataset path.
Set the rigth dataset path and launch the script using python trainer_standalone.py
- Set the right dataset path in this script.
- Be sure to have Kafka running you can follow this
- Launch the mock producer
python src/mock_producer/main.py
- Launch the trainer
python src/trainer/main.py
- Launch the metric collector
python src/metrics_garbage/main.py
Notebooks to plot metrics analysis.