Name		Name	Last commit message	Last commit date
parent directory ..
assets		assets
code		code
configs/train		configs/train
experiments		experiments
notebooks		notebooks
.gitignore		.gitignore
NOTES_MLflow.md		NOTES_MLflow.md
NOTES_Polyaxon.md		NOTES_Polyaxon.md
README.md		README.md

README.md

Reproducible PASCAL VOC2012 training with Ignite

In this example, we provide script and tools to perform reproducible experiments on training neural networks on PASCAL VOC2012 dataset.

Features:

Distributed training with mixed precision by nvidia/apex
Experiments tracking with MLflow or Polyaxon

There are two possible options: 1) Experiments tracking with MLflow or 2) Experiments tracking with Polyaxon. Experiments tracking with MLflow is more suitable for a local machine with GPUs. For experiments tracking with Polyaxon user needs to have Polyaxon installed on a machine/cluster/cloud and can schedule experiments with polyaxon-cli. User can choose one option and skip the descriptions of another option.

Notes for experiments tracking with MLflow
Notes for experiments tracking with Polyaxon

Implementation details

Files tree description:

code
  |___ dataflow : module privides data loaders and various transformers
  |___ scripts : executable training scripts
  |___ utils : other helper modules

configs
  |___ train : training python configuration files  
  
experiments 
  |___ mlflow : MLflow related files
  |___ plx : Polyaxon related files
 
notebooks : jupyter notebooks to check specific parts from code modules

Code and configs

py_config_runner

We use py_config_runner package to execute python scripts with python configuration files.

Training scripts

Training scripts are located code/scripts and contains

mlflow_training.py, training script with MLflow experiments tracking
plx_training.py, training script with Polyaxon experiments tracking
common_training.py, common training code used by above files

Training scripts contain run method required by py_config_runner to run a script with a configuration. Training logic is setup inside training method and configures a distributed trainer, 2 evaluators and various logging handlers to tensorboard, mlflow/polyaxon logger and tqdm.

Configurations

baseline_resnet101.py : trains DeeplabV3-ResNet101 on Pascal VOC2012 dataset only
baseline_resnet101_sbd.py : trains DeeplabV3-ResNet101 on Pascal VOC2012 dataset with SBD

Results

Model	with SBD	Training mIoU+BG	Test mIoU+BG
DeepLabV3 ResNet-101	X	86%	68%

Acknowledgements

Part of trainings was done within Tesla GPU Test Drive on 2 Nvidia V100 GPUs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pascal_voc2012

pascal_voc2012

README.md

Reproducible PASCAL VOC2012 training with Ignite

Implementation details

Code and configs

py_config_runner

Training scripts

Configurations

Results

Acknowledgements

Files

pascal_voc2012

Directory actions

More options

Directory actions

More options

Latest commit

History

pascal_voc2012

Folders and files

parent directory

README.md

Reproducible PASCAL VOC2012 training with Ignite

Implementation details

Code and configs

py_config_runner

Training scripts

Configurations

Results

Acknowledgements