To install cuML from source, ensure the dependencies are met:
- cuDF (>=0.5.1)
- zlib Provided by zlib1g-dev in Ubuntu 16.04
- cmake (>= 3.12.4)
- CUDA (>= 9.2)
- Cython (>= 0.29)
- gcc (>=5.4.0)
- BLAS - Any BLAS compatible with Cmake's FindBLAS
Once dependencies are present, follow the steps below:
- Clone the repository.
$ git clone --recurse-submodules https://github.com/rapidsai/cuml.git
- Build and install
libcuml
(the C++/CUDA library containing the cuML algorithms), starting from the repository root folder:
$ cd cuML
$ mkdir build
$ cd build
$ export CUDA_BIN_PATH=$CUDA_HOME # (optional CUDA_HOME=/path/to/cuda/)
$ cmake ..
If using a conda environment (recommended currently), then cmake can be configured appropriately via:
$ cmake .. -DCMAKE_INSTALL_PREFIX=$CONDA_PREFIX
Note: The following warning message is dependent upon the version of cmake and the CMAKE_INSTALL_PREFIX
used. If this warning is displayed, the build should still run succesfully. We are currently working to resolve this open issue. You can silence this warning by adding -DCMAKE_IGNORE_PATH=$CONDA_PREFIX/lib
to your cmake
command.
Cannot generate a safe runtime search path for target ml_test because files
in some directories may conflict with libraries in implicit directories:
The configuration script will print the BLAS found on the search path. If the version found does not match the version intended, use the flag -DBLAS_LIBRARIES=/path/to/blas.so
with the cmake
command to force your own version.
- Build
libcuml
:
$ make -j
$ make install
To run tests (optional):
$ ./ml_test
If you want a list of the available tests:
$ ./ml_test --gtest_list_tests
- Build the
cuml
python package:
$ cd ../../python
$ python setup.py build_ext --inplace
To run Python tests (optional):
$ py.test -v
If you want a list of the available tests:
$ py.test cuML/test --collect-only
- Finally, install the Python package to your Python path:
$ python setup.py install
cuML's core structure contains:
- cuML: C++/CUDA machine learning algorithms. This library currently includes the following six algorithms:
- Single GPU Truncated Singular Value Decomposition (tSVD)
- Single GPU Principal Component Analysis (PCA)
- Single GPU Density-based Spatial Clustering of Applications with Noise (DBSCAN)
- Single GPU Kalman Filtering
- Multi-GPU K-Means Clustering
- Multi-GPU K-Nearest Neighbors (Uses Faiss)
-
python: Python bindings for the above algorithms, including interfaces for cuDF. These bindings connect the data to C++/CUDA based cuML and ml-prims libraries without leaving GPU memory.
-
ml-prims: Low level machine learning primitives header only library, used in cuML algorithms. Includes:
- Linear Algebra
- Statistics
- Basic Matrix Operations
- Distance Functions
- Random Number Generation
The external folders contains submodules that this project in-turn depends on. Appropriate location flags
will be automatically populated in the main CMakeLists.txt
file for these.
Current external submodules are: