This repo is used to generate the more public-facing labs repo.
Changes made here will not be reflected there without intervention. Currently, this is a manual process.
If you want to set up a development environment for this repo,
you can follow the same instructions as the students do:
instructions/setup/readme.md
.
Any changes that are made to the user-facing process should be documented there, not here.
FYI, we are also experimenting with
devcontainer
s,
which combine a container backend and a VSCode frontend,
as a solution for setting up environments.
GitHub can provide hosting for devcontainer
s with
Codespaces.
Core contributors can get access to GPU-accelerated Codespaces for development.
This environment is, as of the 2022 offering of the course,
not being directly maintained and is not documented in the notes below.
After following those instructions, run
pre-commit install
to add pre-commit checks,
like linting.
These are also tested by CI,
but it's convenient to be able to find and fix small nits locally before push.
The notes below describe how to reliably update the environment for the labs, from system libraries up to Python packages.
Before explaining the details of the environment,
it's helpful to take a step back and clearly state
what problem it is solving.
We want an environment for the labs that
- evinces best practices in and matches the reality of ML product development;
- can be setup quickly and easily by novice/intermediate developers; and
- stays reasonably in line with Google Colab.
Click anywhere on this text to reveal more details.
The purpose of the course is to teach ML product development, from soup to nuts. One strength of the course is its closeness to "real" ML product development, including the tools and workflows used.
Here are some of the features of ML development we want to mimic:
- Development is done by a team with varying levels of SWE expertise, so tools should be easy to learn and mainstream.
- Development includes best practices like testing, linting, CI/CD.
- Training requires GPU acceleration.
- Deployment is based on containerization.
We want to limit the difficulty of the setup, while still keeping a process that is simple enough that it can be easily explained to students and tinkered with.
That means running the entire class inside a user-managed container is out, as are other means of providing a completely pre-built environment.
We compromise by using a transparent Makefile
that uses only limited make
features.
The user experience roughly corresponds to joining a well-run team
with a canonical environment/build process already in place.
We want to keep our environment reasonably in line with Colab, so that the labs run on that platform.
This serves two very important purposes:
- Colab provides an "out" in case the setup is not easy enough. Setup on Colab is perforce automated.
- Colab provides GPU acceleration, which can be expensive, for free.
The Colab environment is a shifting target -- they seem to update PyTorch two weeks after release each time. Due to the limited support for automation in Colab, the best way to do things like check the current version of libraries and run tests is to manually execute a notebook. Here's one that checks that the environment is as expected and runs tests. It should be run from beginning to end with Runtime > Run all, but note that you have to provide a secret interactively in the final cell.
We aim for bug-free execution in the following environments:
- Ubuntu 18.04 LTS
- Google Colab
- (prod only) Amazon Linux 2
- (prod only) Debian Buster
As of writing, support for Windows Subsystem Linux 2 is in alpha.
conda
provides virtual environments, system package installation (including Python runtimes),
and Python package installation.
We use it for virtual environments, system package installation, and Python runtime installation.
poetry
also provides virtual environments and Python runtime installation,
but it does not work well for installing system packages,
and our core libraries are tightly intertwined with the system packages CUDA and CUDNN.
It may become a better choice than conda
in the future.
We use conda
to install and manage the Python runtime for users of the labs. Click to expand for more details.
conda
to install and manage the Python runtime for users of the labs. Click to expand for more details.Python runtimes for the production app and for CI are determined by Docker images,
but the conda
environment is the source of truth.
So the Python version is mentioned in the following places:
environment.yml
, which describes theconda
environment.github/workflows/*.yml
, which describe the CI environmentapp_gradio/Dockerfile
andapi_serverless/Dockerfile
, which describe the production app environment
Changes need to be synchronized by hand.
We use conda
to install these GPU acceleration libraries.
They are needed for training but not for inference,
so the production app environment does not require them.
Click to expand for details.
conda
to install these GPU acceleration libraries.
They are needed for training but not for inference,
so the production app environment does not require them.
Click to expand for details.The CUDA/CUDNN versions are mentioned in the following places:
environment.yml
, which describes theconda
environment
Note that installing the NVIDIA drivers on which these depend is a fairly involved, often manual process. We place it out of scope and presume they are present.
If your (Linux) system does not have the required drivers, which will be indicated by a warning when importing torch, see these instructions, which were up-to-date as of 2022-04-13. Godspeed.
Python packages are installed via pip
with dependencies resolved and pinned by pip-tools
.
Most high-level requirements are set in
requirements/prod.in
and requirements/dev.in
.
Python build tools, e.g. pip
, pip-tools
, setuptools
,
are specified elsewhere,
see below.
Python code quality (e.g. linting, doc checking)
is enforced via
pre-commit
,
so the source of truth for versioning of those tools
is in .pre-commit-config.yaml
.
They are also repeated in
requirements/dev-lint.in
so they can be optionally installed into the development environment.
The .in
files are "compiled" by
pip-tools
to generate concrete .txt
requirements files.
This ensures reproducible transitive dependency management
for Python packages installed by pip
.
This choice effectively limits us to a single OS. To support multiple platforms, we would need to produce "compiled" requirements files for each one and confirm tests pass in each case. This can be automated by using cloud runners for each platform, but we place this out of scope.
Click to expand.
It is possible to use conda
to install all packages,
which would have the salutary effect of limiting the number of tools
and unifying versioning and build information into one place.
However, that would create an extra, fairly heavy dependency in our Docker images.
We would either need to restrict the images we consider
(only those with conda
; which might include lots of other things we don't want)
or include the conda
build step in our Docker build.
Producing a pip
-friendly file from conda
requires
conda-lock
.
We end up with even greater differences between our dev and prod environment setup
and conda-lock
is a less-established tool (it's in the conda-incubator
).
It's also fairly heavy (e.g. depends on poetry) and moves many of our dependencies to the conda-forge
channel.
conda
also does not play nicely with Colab.
The grok-ai nn template
has a similar approach.
They use conda
for Python, CUDA, and CUDNN
and pip
for almost everything else.
They install torch with conda
,
which is worth considering for extra robustness,
but they don't target Colab or Docker.
Click to expand.
To get reproducible builds, we need deterministic build tools.
That means precisely pinned versions for:
pip
setuptools
piptools
These versions are specified in
- the
Makefile
's>pip-tools
targets (for users) - the
Dockerfile
s (for production)
They are not currently pinned in CI.
Click to expand.
These are the libraries required to run the app in production.
We aim to keep this environment lean, to evince best practices for Dockerized web services.
They are specified at a high level in requirements/prod.in
.
After updating the contents of prod.in
,
run make pip-tools
to perform any necessary updates to the compiled prod.txt
and update the local environment.
This may also change downstream environments, e.g. dev
.
Click to expand.
These are the libraries required to develop the model,
e.g. training.
It is also the curriculum development environment --
through the course, students learn to use the same tools
we use to manage the development of the material.
They are specified at a high level in requirements/dev.in
,
which depends on requirements/prod.in
After updating the contents of either prod.in
or dev.in
,
run make pip-tools
to perform any necessary updates to the compiled dev.txt
and update the local environment.