An example for how I configure a computer vision repo.
This repo a unique method to configure complex pipelines that I've developed and iterated on over the years. Config schemas are specified as composable instantiated dataclasses, and are specified in a python module rather than yaml. Specifiying the config as yaml is still an option, and it is always saved for reproducability.
Specifying configs in python provides all of the flexibility of code (for loops, if/else statements, generators, sampling, etc) as well as all of the help of the IDE. This greatly speeds up iteration cycles, since you catch bugs before they happen.
see run_experiments.py for an example.
Install poetry: https://python-poetry.org/docs/#installation
git clone [email protected]:egafni/ComputerVision.git
poetry install
# IF you have cuda11 installed, overwrite the proper version of torch
poetry run pip3 install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu113 -I --no-depspoetry run pip3 install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu113 -I --no-deps
poetry shell
poetry run pytest
# specify your Experiment in run_experiments.py
$ poetry run python run_experiments.py -e ExperimentGroupName -m run
examples:
$ poetry run python scripts/run_experiments.py -e Cifar10 -m run
# something is weird about the DTD dataset causing resizing to fail, need to fix or change datasets
$ poetry run python scripts/run_experiments.py -e DTD -m run
Dockerfile to build the image
Add CI/CD. I like to use gitlab or github workflows. Builds the docker contrainer & runs the tests. Often I will have to test runner on the development server s For more advanced projects, I'll have the tests running a local server so that the docker images are easily cached and it can have access to GPUs and production/R&D data.
The scripts/submit.py script allows you to run any arbitrary command (such as training a model) in the cloud in the exact environment of the current repository.
$ scripts/submit.py --machine-type ... $COMMAND
# ex:
# scripts/submit.py --machine-type ... train.py --config train_config.yaml
- Builds and pushes the docker container
- Runs $COMMAND on an instance in the cloud inside the pushed Docker container
- Streams the output of the command to the console
$ tensorboard --logdir experiments