cuML Python Developer Guide

This document summarizes guidelines and best practices for contributions to the python component of the library cuML, the machine learning component of the RAPIDS ecosystem. This is an evolving document so contributions, clarifications and issue reports are highly welcome.

General

Please start by reading:

CONTRIBUTING.md.
C++ DEVELOPER_GUIDE.md
Python cuML README.md

Thread safety

Refer to the section on thread safety in C++ DEVELOPER_GUIDE.md

Coding style

PEP8 and flake8 is used to check the adherence to this style.
sklearn coding guidelines

Creating class for a new estimator or other ML algorithm

Make sure that this algo has been implemented in the C++ side. Refer to C++ DEVELOPER_GUIDE.md for guidelines on developing in C++.
Refer to the next section for the remaining steps.

Creating python wrapper class for an existing ML algo

Create a corresponding algoName.pyx file inside python/cuml folder.
Ensure that the folder structure inside here reflects that of sklearn's. Example, pca.pyx should be kept inside the decomposition sub-folder of python/cuml. . Match the corresponding scikit-learn's interface as closely as possible. Refer to their developer guide on API design of sklearn objects for details.
Always make sure to have your class inherit from cuml.Base class as your parent/ancestor.
Ensure that the estimator's output fields follow the 'underscore on both sides' convention explained in the documentation of cuml.Base. This allows it to support configurable output types.

Error handling

If you are trying to call into cuda runtime APIs inside cuml.cuda, in case of any errors, they'll raise a cuml.cuda.CudaRuntimeError. For example:

from cuml.cuda import Stream, CudaRuntimeError
try:
    s = Stream()
    s.sync
except CudaRuntimeError as cre:
    print("Cuda Error! '%s'" % str(cre))

Logging

TBD

Documentation

We mostly follow PEP 257 style docstrings for documenting the interfaces.

Testing and Unit Testing

We use https://docs.pytest.org/en/latest/ for writing and running tests. To see existing examples, refer to any of the test_*.py files in the folder cuml/test.

Device and Host memory allocations

TODO: talk about enabling RMM here when it is ready

Asynchronous operations and stream ordering

If you want to schedule the execution of two algorithms concurrently, it is better to create two separate streams and assign them to separate handles. Finally, schedule the algorithms using these handles.

import cuml
from cuml.cuda import Stream
s1 = Stream()
h1 = cuml.Handle()
h1.setStream(s1)
s2 = Stream()
h2 = cuml.Handle()
h2.setStream(s2)
algo1 = cuml.Algo1(handle=h1, ...)
algo2 = cuml.Algo2(handle=h2, ...)
algo1.fit(X1, y1)
algo2.fit(X2, y2)

To know more underlying details about stream ordering refer to the corresponding section of C++ DEVELOPER_GUIDE.md

Multi GPU

We currently have Single Process Multiple GPU (SPMG) versions of KNN, OLS and tSVD. Our upcoming versions will concentrate on One Process per GPU (OPG) paradigm.

TODO: Add more details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DEVELOPER_GUIDE.md

DEVELOPER_GUIDE.md

cuML Python Developer Guide

General

Thread safety

Coding style

Creating class for a new estimator or other ML algorithm

Creating python wrapper class for an existing ML algo

Error handling

Logging

Documentation

Testing and Unit Testing

Device and Host memory allocations

Asynchronous operations and stream ordering

Multi GPU

Files

DEVELOPER_GUIDE.md

Latest commit

History

DEVELOPER_GUIDE.md

File metadata and controls

cuML Python Developer Guide

General

Thread safety

Coding style

Creating class for a new estimator or other ML algorithm

Creating python wrapper class for an existing ML algo

Error handling

Logging

Documentation

Testing and Unit Testing

Device and Host memory allocations

Asynchronous operations and stream ordering

Multi GPU