Welcome to the runtime repository for the Youth Mental Health: Automated Abstraction challenge on DrivenData! This repository contains a few things to help you create your code submission for this code execution competition:
- Example submission (
example_submission/
) — a simple demonstration solution, which runs successfully in the code execution runtime and outputs a valid submission. This provides the function signatures that you should implement in your solution. - Runtime environment specification (
runtime/
) — the definition of the environment in which your code will run.
You can use this repository to:
🔧 Test your submission: Test your submission using a locally running version of the competition runtime to discover errors before submitting to the competition website.
📦 Request new packages in the official runtime: Since your submission will not have general access to the internet, all dependencies must be pre-installed. If you want to use a package that is not already in the runtime environment, make a pull request to this repository. Make sure to test out adding the new package to both official environments (CPU and GPU).
Changes to the repository are documented in CHANGELOG.md.
- Code submission format
- Running your submission locally
- Running the example submission locally
- Smoke tests
This quickstart guide will show you how to get started using this repository.
When you make a submission on the DrivenData competition site, we run your submission inside a Docker container, a virtual operating system that allows for a consistent software environment across machines. The best way to make sure your submission will run sucessfully is to test it in a container on your local machine first. For that, you'll need:
- A clone of this repository
- Docker
- At least 5 GB of free space for the CPU version of the Docker image or at least 15 GB of free space for the GPU version
- GNU make (optional, but useful for running the commands in the Makefile)
Additional requirements to run with GPU:
- NVIDIA drivers with CUDA 11
- NVIDIA container toolkit
In the official code execution platform, code_execution/data
will contain features for the test set. See the code submission page for details of the code_execution/data/test_features.csv
file.
To test your submission in a local container, save a file under data/test_features.csv
that matches the format of the actual test features file. For example, you could use a set of training examples. When you run your submission in a Docker container locally, the file you provide will be included in the container.
We also provide a script for you to evaluate your generated predictions using known training set labels. src/scoring.py
takes the path to your predictions and the path to the corresponding labels, and calculates variable-averaged F1 score per the competition performance metric.
$ python src/scoring.py submission/submission.csv data/train_labels.csv
Variable-averaged F1 score: 0.0061
As you develop your own submission, you'll need to know a little bit more about how your submission will be unpacked for running inference. This section contains more complete documentation for developing and testing your own submission.
Your final submission should be a zip archive named with the extension .zip
(for example, submission.zip
).
A template for main.py
is included at example_submission/main.py
. For more detail, see the "what to submit" section of the code submission page.
This section provides instructions on how to run your submission in the code execution container from your local machine. To simplify the steps, key processes have been defined in the Makefile
. Commands from the Makefile
are then run with make {command_name}
. The basic steps are:
make pull
make pack-submission
make test-submission
Run make help
for more information about the available commands as well as information on the official and built images that are available locally.
Here's the process in a bit more detail:
-
First, make sure you have set up the prerequisites.
-
Run
make pull
to download the official competition Docker image
Note
If you have built a local version of the runtime image with make build
, that image will take precedence over the pulled image when using any make commands that run a container. You can explicitly use the pulled image by setting the SUBMISSION_IMAGE
shell/environment variable to the pulled image or by deleting all locally built images.
-
Save all of your submission files, including the required
main.py
script, in thesubmission_src
folder of the runtime repository. Make sure any needed model weights and other assets are saved insubmission_src
as well. -
Run
make pack-submission
to create asubmission/submission.zip
file containing your code and model assets. Thissubmission.zip
file is what you will ultimately submit on the competition website.make pack-submission #> mkdir -p submission/ #> cd submission_src; zip -r ../submission/submission.zip ./* #> adding: main.py (deflated 73%)
-
Run
make test-submission
to simulate what happens during code execution on your local machine. This command launches an instance of the competition Docker images and runs the container entrypoint script. First, it unzipssubmission/submission.zip
into/code_execution/
in the container. Then, it runs your submittedmain.py
. In the local testing setting, the final submission is saved out to thesubmission/
folder on your local machine. This is the same inference process that will take place in the official runtime.make test-submission
Note
Remember that /code_execution/data
is just a mounted version of what you have saved locally in data
so you will just be using the training files for local testing. In the official code execution platform, /code_execution/data
will contain the actual test data.
🎉 Congratulations! You've just completed your first test run for the Youth Mental Health: Automated Abstraction challenge. If everything worked as expected, you should see that a new file submission/submission.csv
has been generated.
When you run make test-submission
, the logs will be printed to the terminal and written out to submission/log.txt
. If you run into errors, use the container logs written to log.txt
to determine what changes you need to make for your code to execute successfully.
Before you test your own submission, you can test the process above with the provided example submission first. This will follow the same process as running your submission, but will use the code in example_submission
instead of the code in submission_src
.
To run the example submission using make
commands, make sure that Docker is running and then run the following in the terminal:
make pull
pulls the latest official Docker image from the container registry (Azure). You'll need an internet connection for this.make pack-example
packages all files saved in theexample_submission
directory tosubmission/submission.zip
make test-submission
simulates a code execution submission withsubmission/submission.zip
. This will runexample_submission/main.py
from within a Docker container to generationsubmission.csv
.
In order to prevent leakage of the test features, all logging is prohibited when running inference on the test features as part of an official submission. When submitting on the platform, you will have the ability to submit "smoke tests". Smoke tests run with logging enabled on a reduced version of the training set notes in order to run more quickly. They will not be considered for prize evaluation and are intended to let you test your code for correctness. In this competition, smoke tests will be the only place you can view logs or output from your code to debug. You should test your code locally as thorougly as possible before submitting your code for smoke tests or for full evaluation.
During a smoke test, you will still have access to data/submission_format.csv
and data/test_features.csv
. These files will be samples from the training set instead of test data. The data used in smoke tests is available on the data download page. To replicate the smoke test environment locally:
- Save
smoke_test_features.csv
from the data download page todata/test_features.csv
. - Save
smoke_test_labels.csv
from the data download page todata/smoke_test_labels.csv
. If your code references a submission format file, copy the labels todata/submission_format.csv
as well.
After you generate predictions on the smoke test data using make test-submission
, you can score them by running:
python src/scoring.py submission/submission.csv data/smoke_test_labels.csv
If you've followed the above instructions, this score should match the one you receive from the smoke test environment on the platform.
If you want to use a package that is not in the environment, you are welcome to make a pull request to this repository. Remember, your submission will only have access to packages in this runtime repository. If you're new to the GitHub contribution workflow, check out this guide by GitHub.
The runtime manages dependencies using Pixi. Here is a good tutorial to get started with Pixi. The official runtime uses Python 3.10.13.
-
Fork this repository.
-
Install pixi. See here for installation options.
-
Edit the
runtime/pixi.toml
file to add your new packages in thedependencies
section. You'll need to determine which environment(s) your new package is required for, and whether the package will be installed with conda (preferred) or pip. We recommend starting without a specific pinned version, and then pinning to the version in the resolvedpixi.lock
file that is generated.-
CPU, GPU, or base: The
pixi.toml
file includes different sections for dependencies that apply to both the CPU and GPU environments (feature.base
), the CPU environment only (feature.cpu
), and the GPU environment only (feature.gpu
). -
Conda or pip: Packages installed using conda are specified by the header
dependencies
. These install from the conda-forge channel usingconda install
. Packages installed with pip are specified by the headerpypi-dependencies
. These install from PyPI usingpip
. Installing packages with conda is strongly preferred. Packages should only be installed usingpip
if they are not available in a conda channel. Conda dependencies are much faster to resolve than PyPI dependencies. -
For example, to add version 0.0.1 of
package1
to both the CPU and GPU environments using conda, you would add the linepackage1 = "0.0.1"
under[feature.base.dependencies]
. To add version 0.2 ofpackage2
to the CPU environment only using pip, you would add the linepackage2 = { version = "0.2.*" }
under the header[feature.cpu.pypi-dependencies]
.[feature.base.dependencies] package1 = "0.0.1" [feature.cpu.pypi-dependencies] package2 = { version = "0.2.*" }
-
-
With Docker open and running, run
make update-lockfile
. This will generate an updatedruntime/pixi.lock
fromruntime/pixi.toml
within a Docker container. -
Locally test that the Docker image builds successfully for both the CPU and GPU environment:
CPU_OR_GPU=cpu make build CPU_OR_GPU=gpu make build
-
Commit the changes to your forked repository. Ensure that your branch includes updated versions of both
runtime/pixi.toml
andruntime/pixi.lock
. -
Open a pull request from your branch to the
main
branch of this repository. Navigate to the Pull requests tab in this repository, and click the "New pull request" button. For more detailed instructions, check out GitHub's help page. -
Once you open the pull request, we will use Github Actions to build the Docker images with your changes and run the tests in
runtime/tests
. For security reasons, administrators may need to approve the workflow run before it happens. Once it starts, the process can take up to 30 minutes, and may take longer if your build is queued behind others. You will see a section on the pull request page that shows the status of the tests and links to the logs ("Details"): -
You may be asked to submit revisions to your pull request if the tests fail or if a DrivenData staff member has feedback. Pull requests won't be merged until all tests pass and the team has reviewed and approved the changes.
A Makefile with several helpful shell recipes is included in the repository. The runtime documentation above uses it extensively. Running make
by itself in your shell will list relevant Docker images and provide you the following list of available commands:
Available commands:
build Builds the container locally
clean Delete temporary Python cache and bytecode files
interact-container Open an interactive bash shell within the running container
pack-example Creates a submission/submission.zip file from the source code in
example_submission
pack-submission Creates a submission/submission.zip file from the source code in
submission_src
pull Pulls the official container from Azure Container Registry
test-container Ensures that your locally built image can import all the Python packages
successfully when it runs
test-submission Runs container using code from `submission/submission.zip` and data from
`/code_execution/data/`
update-lockfile Updates runtime environment lockfile using Docker