SparseOperationKit (SOK) is a python package wrapped GPU accelerated operations dedicated for sparse training / inference cases. It is designed to be compatible with common DeepLearning (DL) frameworks, for instance, TensorFlow.
Most of the algorithm implementations in SOK are extracted from HugeCTR, which is a GPU-accelerated recommender framework designed to distribute training across multiple GPUs and nodes and estimate Click-Through Rates (CTRs). If you are looking for a very efficient solution for CTRs, please check HugeCTR.
Model-Parallelism GPU Embedding Layer
In sparse training / inference scenarios, for instance, CTRs, there are very huge amount of parameters which is not fit into single GPU memory. And common DL frameworks does not support model-parallelism (MP). Therefore it is hard to fully utilize all available GPUs in a cluster to accelerate the whole training process.
SOK provides MP functionality to fully utilize all avaiable GPUs, no matter these GPUs are located in single machine or multiple machines. Because most of DL frameworks have provided data-parallelism (DP), therefore SOK is designed to be compatible with DP training to minimize the code changes. With SOK embedding layers, one can build a DNN model with mixed MP and DP, where MP is for embedding parameters that is distributed among all available GPUs and DP is for other layers that only consume little GPU resources.
Several MP embedding layers are integrated into SOK. These embedding layers can leverage all available GPU memory to house embedding parameters, no matter in single machine or multiple machine. All the utilized GPUs work synchronously.
Due to SOK is compatible with DP training provided by common synchronized training frameworks, such as Horovod, TensorFlow Distribute Strategy, therefore the input data feeded to these embedding layers is in DP manner, and that means no further DP from/to MP transformation is needed when SOK is used to scale up your DNN model from single GPU to multiple GPUs. The following picture depicts the workflow of these embedding layers.
There are several ways to install this package.
In the docker image: nvcr.io/nvidia/merlin/merlin-tensorflow-training:22.03
, SparseOpeationKit is already installed, and you can directrly import this module via:
import sparse_opeation_kit as sok
$ pip install --user SparseOperationKit
Note: Currently, SOK has not been uploaded to PYPI, therefore this command does not work. But you may take the following commands to build a pip installable package by yourself.
- config building env
Build SOK pip package within a system where python3.x and the following modules are installed.setuptools, os, sys, subprocess, shutil
- build pip package
$ git clone https://github.com/NVIDIA-Merlin/HugeCTR.git $ cd hugectr/sparse_operation_kit/ $ python setup.py sdist
- copy that package to target system
$ cp ./dist/*.tar.gz /<YourTargetPath>
- install SOK
$ pip install --user SparseOperationKit
If you want to build this module from souce code, here are the steps to follow:
- download the source code
$ git clone https://github.com/NVIDIA-Merlin/HugeCTR hugectr
- install to system
$ cd hugectr/sparse_operation_kit/ $ python setup.py install
Want to find more about SparseOperationKit, see our SparseOperationKit documents.