If you are interested in contributing to cuML, your contributions will fall into three categories:
- You want to report a bug, feature request, or documentation issue
- File an issue describing what you encountered or what you want to see changed.
- Please run and paste the output of the
cuml/print_env.sh
script while reporting a bug to gather and report relevant environment details. - The RAPIDS team will evaluate the issues and triage them, scheduling them for a release. If you believe the issue needs priority attention comment on the issue to notify the team.
- You want to propose a new Feature and implement it
- Post about your intended feature, and we shall discuss the design and implementation.
- Once we agree that the plan looks good, go ahead and implement it, using the code contributions guide below.
- You want to implement a feature or bug-fix for an outstanding issue
- Follow the code contributions guide below.
- If you need more context on a particular issue, please ask and we shall provide.
- Read the project's README.md to learn how to setup the development environment.
- Find an issue to work on. The best way is to look for the good first issue or help wanted labels
- Comment on the issue saying you are going to work on it.
- Get familiar with the developer guide relevant for you:
- For C++ developers it is available here DEVELOPER_GUIDE.md
- For Python developers, a Python DEVELOPER_GUIDE.md is available as well.
- Code! Make sure to update unit tests!
- When done, create your pull request.
- Verify that CI passes all status checks, or fix if needed.
- Wait for other developers to review your code and update code as needed.
- Once reviewed and approved, a RAPIDS developer will merge your pull request.
Remember, if you are unsure about anything, don't hesitate to comment on issues and ask for clarifications!
Consistent code formatting is important in the cuML project to ensure readability, maintainability, and thus simplifies collaboration.
cuML uses pre-commit to execute code linters and formatters that check the code for common issues, such as syntax errors, code style violations, and help to detect bugs. Using pre-commit ensures that linter versions and options are aligned for all developers. The same hooks are executed as part of the CI checks. This means running pre-commit checks locally avoids unnecessary CI iterations.
To use pre-commit
, install the tool via conda
or pip
into your development
environment:
conda install -c conda-forge pre-commit
Alternatively:
pip install pre-commit
After installing pre-commit, it is recommended to install pre-commit hooks to run automatically before creating a git commit. In this way, it is less likely that style checks will fail as part of CI checks. To install pre-commit hooks, simply run the following command within the repository root directory:
pre-commit install
By default, pre-commit runs on staged files only, meaning only on changes that are about to be committed. To run pre-commit checks on all files, execute:
pre-commit run --all-files
To skip the checks temporarily, use git commit --no-verify
or its short form
-n
.
Note: If the auto-formatters' changes affect each other, you may need to go
through multiple iterations of git commit
and git add -u
.
cuML also uses codespell to find spelling
mistakes, and this check is run as part of the pre-commit hook. To apply the suggested spelling
fixes, you can run codespell -i 3 -w .
from the command-line in the cuML root directory.
This will bring up an interactive prompt to select which spelling fixes to apply.
If you want to ignore errors highlighted by codespell you can:
- Add the word to the ignore-words-list in pyproject.toml, to exclude for all of cuML
- Exclude the entire file from spellchecking, by adding to the
exclude
regex in .pre-commit-config.yaml - Ignore only specific lines as shown in codespell-project/codespell#1212 (comment)
The pre-commit hooks configured for this repository consist of a number of
linters and auto-formatters that we summarize here. For a full and current list,
please see the .pre-commit-config.yaml
file.
clang-format
: Formats C++ and CUDA code for consistency and readability.black
: Auto-formats Python code to conform to the PEP 8 style guide.flake8
: Lints Python code for syntax errors and common code style issues.cython-lint
: Lints Cython code for syntax errors and common code style issues.DeprecationWarning
checker: Checks for newDeprecationWarning
being introduced in Python code, and insteadFutureWarning
should be used.#include
syntax checker: Ensures consistent syntax for C++#include
statements.- Copyright header checker and auto-formatter: Ensures the copyright headers of files are up-to-date and in the correct format.
codespell
: Checks for spelling mistakes
In order to maintain high-quality code, cuML uses not only pre-commit hooks featuring various formatters and linters but also the clang-tidy tool. Clang-tidy is designed to detect potential issues within the C and C++ code. It is typically run as part of our continuous integration (CI) process.
While it's generally unnecessary for contributors to run clang-tidy locally, there might be cases where you would want to do so. There are two primary methods to run clang-tidy on your local machine: using Docker or Conda.
-
Docker
-
Navigate to the repository root directory.
-
Run the following Docker command:
docker run --rm --pull always \ --mount type=bind,source="$(pwd)",target=/opt/repo --workdir /opt/repo \ -e SCCACHE_S3_NO_CREDENTIALS=1 \ rapidsai/ci-conda:latest /opt/repo/ci/run_clang_tidy.sh
-
-
Conda
- Navigate to the repository root directory.
- Create and activate the needed conda environment:
conda env create --yes -n cuml-clang-tidy -f conda/environments/clang_tidy_cuda-118_arch-x86_64.yaml conda activate cuml-clang-tidy
- Generate the compile command database with
./build.sh --configure-only libcuml
- Run clang-tidy with the following command:
python cpp/scripts/run-clang-tidy.py --config pyproject.toml
Each PR must be labeled according to whether it is a "breaking" or "non-breaking" change (using Github labels). This is used to highlight changes that users should know about when upgrading.
For cuML, a "breaking" change is one that modifies the public, non-experimental, Python API in a non-backward-compatible way. The C++ API does not have an expectation of backward compatibility at this time, so changes to it are not typically considered breaking. Backward-compatible API changes to the Python API (such as adding a new keyword argument to a function) do not need to be labeled.
Additional labels must be applied to indicate whether the change is a feature, improvement, bugfix, or documentation change. See the shared RAPIDS documentation for these labels: https://github.com/rapidsai/kb/issues/42.
Once you have gotten your feet wet and are more comfortable with the code, you can look at the prioritized issues of our next release in our project boards.
Pro Tip: Always look at the release board with the highest number for issues to work on. This is where RAPIDS developers also focus their efforts.
Look at the unassigned issues, and find an issue you are comfortable with contributing to. Start with Step 3 from above, commenting on the issue to let others know you are working on it. If you have any questions related to the implementation of the issue, ask them in the issue instead of the PR.
The cuML repository has two main branches:
main
branch: it contains the last released version. Only hotfixes are targeted and merged into it.branch-x.y
: it is the development branch which contains the upcoming release. All the new features should be based on this branch and Merge/Pull request should target this branch (with the exception of hotfixes).
For every new version x.y
of cuML there is a corresponding branch called branch-x.y
, from where new feature development starts and PRs will be targeted and merged before its release. The exceptions to this are the 'hotfixes' that target the main
branch, which target critical issues raised by Github users and are directly merged to main
branch, and create a new subversion of the project. While trying to patch an issue which requires a 'hotfix', please state the intent in the PR.
For all development, your changes should be pushed into a branch (created using the naming instructions below) in your own fork of cuML and then create a pull request when the code is ready.
A few days before releasing version x.y
the code of the current development branch (branch-x.y
) will be frozen and a new branch, 'branch-x+1.y' will be created to continue development.
Branches used to create PRs should have a name of the form <type>-<name>
which conforms to the following conventions:
- Type:
- fea - For if the branch is for a new feature(s)
- enh - For if the branch is an enhancement of an existing feature(s)
- bug - For if the branch is for fixing a bug(s) or regression(s)
- Name:
- A name to convey what is being worked on
- Please use dashes or underscores between words as opposed to spaces.
Portions adopted from https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md