Skip to content

Latest commit

 

History

History
 
 

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Reproducible ImageNet training with Ignite

In this example, we provide script and tools to perform reproducible experiments on training neural networks on ImageNet dataset.

Features:

tb_dashboard

There are two possible options: 1) Experiments tracking with MLflow or 2) Experiments tracking with Polyaxon. Experiments tracking with MLflow is more suitable for a local machine with GPUs. For experiments tracking with Polyaxon user needs to have Polyaxon installed on a machine/cluster/cloud and can schedule experiments with polyaxon-cli. User can choose one option and skip the descriptions of another option.

Implementation details

Files tree description:

code
  |___ dataflow : module privides data loaders and various transformers
  |___ scripts : executable training scripts
  |___ utils : other helper modules

configs
  |___ train : training python configuration files  
  
experiments 
  |___ mlflow : MLflow related files
  |___ plx : Polyaxon related files
 
notebooks : jupyter notebooks to check specific parts from code modules 

Code and configs

We use py_config_runner package to execute python scripts with python configuration files.

Training scripts

Training scripts are located code/scripts and contains

  • mlflow_training.py, training script with MLflow experiments tracking
  • plx_training.py, training script with Polyaxon experiments tracking
  • common_training.py, common training code used by above files

Training scripts contain run method required by py_config_runner to run a script with a configuration. Training logic is setup inside training method and configures a distributed trainer, 2 evaluators and various logging handlers to tensorboard, mlflow/polyaxon logger and tqdm.

Configurations

Results

Model Training Top-1 Accuracy Training Top-5 Accuracy Test Top-1 Accuracy Test Top-5 Accuracy
ResNet-50 78% 92% 77% 94%

Acknowledgements

Part of trainings was done within Tesla GPU Test Drive on 2 Nvidia V100 GPUs.

tb_dashboard_images