Skip to content

Commit

Permalink
Doc v3.4.1 joey
Browse files Browse the repository at this point in the history
  • Loading branch information
zehuanw committed Mar 1, 2022
1 parent 5989232 commit d8ae8fa
Show file tree
Hide file tree
Showing 23 changed files with 65 additions and 37 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ If you'd like to quickly train a model using the Python interface, do the follow

1. Start a NGC container with your local host directory (/your/host/dir mounted) by running the following command:
```
docker run --gpus=all --rm -it --cap-add SYS_NICE -v /your/host/dir:/your/container/dir -w /your/container/dir -it -u $(id -u):$(id -g) nvcr.io/nvidia/merlin/merlin-training:22.02
docker run --gpus=all --rm -it --cap-add SYS_NICE -v /your/host/dir:/your/container/dir -w /your/container/dir -it -u $(id -u):$(id -g) nvcr.io/nvidia/merlin/merlin-training:22.03
```

**NOTE**: The **/your/host/dir** directory is just as visible as the **/your/container/dir** directory. The **/your/host/dir** directory is also your starting directory.
Expand Down
2 changes: 1 addition & 1 deletion docs/hugectr_user_guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -77,7 +77,7 @@ The following sample command pulls and starts the Merlin Training container:

```shell
# Run the container in interactive mode
$ docker run --gpus=all --rm -it --cap-add SYS_NICE nvcr.io/nvidia/merlin/merlin-training:22.02
$ docker run --gpus=all --rm -it --cap-add SYS_NICE nvcr.io/nvidia/merlin/merlin-training:22.03
```

### Building HugeCTR from Scratch
Expand Down
8 changes: 4 additions & 4 deletions notebooks/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,12 +7,12 @@ The quickest way to run a notebook here is with a docker container, which provid
### Pull the NGC Docker
To start the [sparse_operation_kit_demo.ipynb](../sparse_operation_kit/notebooks/sparse_operation_kit_demo.ipynb) notebook, pull this docker image:
```
docker pull nvcr.io/nvidia/merlin/merlin-tensorflow-training:22.02
docker pull nvcr.io/nvidia/merlin/merlin-tensorflow-training:22.03
```

To start the other notebooks, pull the docker image using the following command:
```
docker pull nvcr.io/nvidia/merlin/merlin-training:22.02
docker pull nvcr.io/nvidia/merlin/merlin-training:22.03
```

### Clone the HugeCTR Repository
Expand All @@ -25,11 +25,11 @@ git clone https://github.com/NVIDIA/HugeCTR

1. Launch the container in interactive mode (mount the HugeCTR root directory into the container for your convenience) by running this command:
```
docker run --runtime=nvidia --rm -it --cap-add SYS_NICE -u $(id -u):$(id -g) -v $(pwd):/hugectr -w /hugectr -p 8888:8888 nvcr.io/nvidia/merlin/merlin-training:22.02
docker run --runtime=nvidia --rm -it --cap-add SYS_NICE -u $(id -u):$(id -g) -v $(pwd):/hugectr -w /hugectr -p 8888:8888 nvcr.io/nvidia/merlin/merlin-training:22.03
```
Launch the container in interactive mode (mount the HugeCTR root directory into the container for your convenience) by running this command to run [sparse_operation_kit_demo.ipynb](../sparse_operation_kit/notebooks/sparse_operation_kit_demo.ipynb) notebook :
```
docker run --runtime=nvidia --rm -it --cap-add SYS_NICE -u $(id -u):$(id -g) -v $(pwd):/hugectr -w /hugectr -p 8888:8888 nvcr.io/nvstaging/merlin/merlin-tensorflow-training:22.02
docker run --runtime=nvidia --rm -it --cap-add SYS_NICE -u $(id -u):$(id -g) -v $(pwd):/hugectr -w /hugectr -p 8888:8888 nvcr.io/nvstaging/merlin/merlin-tensorflow-training:22.03
```

2. Start Jupyter using these commands:
Expand Down
2 changes: 1 addition & 1 deletion notebooks/continuous_training.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@
"## Installation\n",
"\n",
"### 1.1 Get HugeCTR from NGC\n",
"The continuous training module is preinstalled in the [Merlin Training Container](https://ngc.nvidia.com/catalog/containers/nvidia:merlin:merlin-training): `nvcr.io/nvidia/merlin/merlin-training:22.02`.\n",
"The continuous training module is preinstalled in the [Merlin Training Container](https://ngc.nvidia.com/catalog/containers/nvidia:merlin:merlin-training): `nvcr.io/nvidia/merlin/merlin-training:22.03`.\n",
"\n",
"You can check the existence of required libraries by running the following Python code after launching this container.\n",
"```bash\n",
Expand Down
2 changes: 1 addition & 1 deletion notebooks/hugectr2onnx_demo.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@
"<a id=\"1\"></a>\n",
"## 1. Access the HugeCTR to ONNX Converter\n",
"\n",
"1. Please make sure that you start the notebook inside the running NGC docker container: `nvcr.io/nvidia/merlin/merlin-training:22.02`. The module of the ONNX converter is installed to the system path `/usr/local/lib/python3.8/dist-packages`. As for HugeCTR Python interface, a dynamic link to the `hugectr.so` library is installed to the system path `/usr/local/hugectr/lib/`. You can access the ONNX converter as well as HugeCTR Python interface anywhere within the container."
"1. Please make sure that you start the notebook inside the running NGC docker container: `nvcr.io/nvidia/merlin/merlin-training:22.03`. The module of the ONNX converter is installed to the system path `/usr/local/lib/python3.8/dist-packages`. As for HugeCTR Python interface, a dynamic link to the `hugectr.so` library is installed to the system path `/usr/local/hugectr/lib/`. You can access the ONNX converter as well as HugeCTR Python interface anywhere within the container."
]
},
{
Expand Down
2 changes: 1 addition & 1 deletion notebooks/hugectr_criteo.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@
"\n",
"<a id=\"11\"></a>\n",
"### 1.1 Get HugeCTR from NGC\n",
"The HugeCTR Python module is preinstalled in the [Merlin Training Container](https://ngc.nvidia.com/catalog/containers/nvidia:merlin:merlin-training): `nvcr.io/nvidia/merlin/merlin-training:22.02`.\n",
"The HugeCTR Python module is preinstalled in the [Merlin Training Container](https://ngc.nvidia.com/catalog/containers/nvidia:merlin:merlin-training): `nvcr.io/nvidia/merlin/merlin-training:22.03`.\n",
"\n",
"You can check the existence of required libraries by running the following Python code after launching this container.\n",
"```bash\n",
Expand Down
2 changes: 1 addition & 1 deletion notebooks/movie-lens-example.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@
"\n",
"\n",
"### 1.1 Docker containers\n",
"Please make sure that you have started the notebook inside the running NGC docker container: `nvcr.io/nvidia/merlin/merlin-training:22.02`. The HugeCTR Python interface have been installed to the system path `/usr/local/hugectr/lib/`. Besides, this system path is added to the environment variable `PYTHONPATH`, which means that you can use the HugeCTR Python interface within the docker container environment.\n",
"Please make sure that you have started the notebook inside the running NGC docker container: `nvcr.io/nvidia/merlin/merlin-training:22.03`. The HugeCTR Python interface have been installed to the system path `/usr/local/hugectr/lib/`. Besides, this system path is added to the environment variable `PYTHONPATH`, which means that you can use the HugeCTR Python interface within the docker container environment.\n",
"\n",
"### 1.2 Hardware\n",
"This notebook requires a Pascal, Volta, Turing, Ampere or newer GPUs, such as P100, V100, T4 or A100. "
Expand Down
2 changes: 1 addition & 1 deletion notebooks/news-example.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Please remember to run this jupyter notebook in the [merlin-training:22.02](https://ngc.nvidia.com/catalog/containers/nvidia:merlin:merlin-training) docker container."
"Please remember to run this jupyter notebook in the [merlin-training:22.03](https://ngc.nvidia.com/catalog/containers/nvidia:merlin:merlin-training) docker container."
]
},
{
Expand Down
2 changes: 1 addition & 1 deletion onnx_converter/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ There are several ways to install this package.

**Use NGC Container**

In the docker image: `nvcr.io/nvidia/merlin/merlin-training:22.02`, hugectr2onnx is already installed and you can directrly import this package via:
In the docker image: `nvcr.io/nvidia/merlin/merlin-training:22.03`, hugectr2onnx is already installed and you can directrly import this package via:
```python
import hugectr2onnx
```
Expand Down
32 changes: 30 additions & 2 deletions release_notes.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,32 @@
# Release Notes
## What's New in Version 3.4.1
+ **Support mixed precision inference for dataset with multiple labels**: We enable FP16 for the `Softmax` layer and support mixed precision for multi-label inference. For more information, please refer to [Inference API](docs/python_interface.md#inference-api).

+ **Support multi-GPU offline inference with Python API**: We support multi-GPU offline inference with the Python interface, which can leverage [Hierarchical Parameter Server](docs/hugectr_parameter_server.md) and enable concurrent execution on multiple devices. For more information, please refer to [Inference API](docs/python_interface.md#inference-api) and [Multi-GPU Offline Inference Notebook](notebooks/multi_gpu_offline_inference.ipynb).

+ **Introduction to metadata.json**: We add the introduction to `_metadata.json` for Parquet datasets. For more information, please refer to [Parquet](docs/python_interface.md#parquet).

+ **Documents and tool for workspace size per GPU estimation**: we add a tool named [embedding_workspace_calculator](tools/embedding_workspace_calculator) to help calculate `workspace_size_per_gpu_in_mb` required by hugectr.SparseEmbedding. For more information, please refer to [embedding_workspace_calculator/README.md](tools/embedding_workspace_calculator/README.md) and [QA 24](docs/QAList.md#24-how-to-set-workspace_size_per_gpu_in_mb-and-slot_size_array).

+ **Improved Debugging Capability**: The old logging system, which was flagged as deprecated for some time has been removed. All remaining log messages and outputs have been revised and migrated to the new logging system (base/debug/logging.hpp/cpp). During this revision, we also adjusted log levels for log messages throughout the entire codebase to improve visibility of relevant information.

+ **Support HDFS Parameter Server in Training**:
+ Decoupled HDFS in Merlin containers to make the HDFS support more flexible. Users can now compile HDFS related functionalities optionally.
+ Now supports loading and dumping models and optimizer states from HDFS.
+ Added a [notebook](notebooks/training_with_hdfs.ipynb) to show how to use HugeCTR with HDFS.

+ **Support Multi-hot Inference on Hugectr Backend**: We support categorical input in multi-hot format for HugeCTR Backend inference.

+ **Multi-label inference with mixed precision**: Mixed precision training is enabled for softmax layer.

+ **Python Script and documentation demonstrating how to analyze model files**: In this release, we provide a script to retreive vocabulary information from model file. Please find more details on the [readme](tools/model_analyzer/README.md)

+ **Bug Fixing**:
+ Mirror strategy bug in SOK (see in https://github.com/NVIDIA-Merlin/HugeCTR/issues/291)
+ Can't import sparse operation kit in nvcr.io/nvidia/merlin/merlin-tensorflow-training:22.03 (see in https://github.com/NVIDIA-Merlin/HugeCTR/issues/296)
+ HPS: Fixed access violation that can occur during initialization when not configuring a volatile DB.



## What's New in Version 3.4
+ **Supporting HugeCTR Development with Merlin Unified Container**: From Merlin v22.02 we encourage you to develop HugeCTR under Merlin Unified Container (release container) according to the instructions in [Contributor Guide](docs/hugectr_contributor_guide.md) to keep consistent.
Expand Down Expand Up @@ -202,6 +230,6 @@
+ HugeCTR uses NCCL to share data between ranks, and NCCL may require shared system memory for IPC and pinned (page-locked) system memory resources. When using NCCL inside a container, it is recommended that you increase these resources by issuing: `-shm-size=1g -ulimit memlock=-1`
See also [NCCL's known issue](https://docs.nvidia.com/deeplearning/nccl/user-guide/docs/troubleshooting.html#sharing-data). And the [GitHub issue](https://github.com/NVIDIA-Merlin/HugeCTR/issues/243).

+ Softmax layer currently does not support fp16 mode.

+ KafkaProducers startup will succeed, even if the target Kafka broker is unresponsive. In order to avoid data-loss in conjunction with streaming model updates from Kafka, you have to make sure that a sufficient number of Kafka brokers is up, working properly and reachable from the node where you run HugeCTR.

+ The number of data files in the file list should be no less than the number of data reader workers. Otherwise, different workers will be mapped to the same file and data loading does not progress as expected.
4 changes: 2 additions & 2 deletions samples/criteo/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,11 +11,11 @@ HugeCTR is available as buildable source code, but the easiest way to install an

1. Pull the HugeCTR NGC Docker by running the following command:
```bash
$ docker pull nvcr.io/nvidia/merlin/merlin-training:22.02
$ docker pull nvcr.io/nvidia/merlin/merlin-training:22.03
```
2. Launch the container in interactive mode with the HugeCTR root directory mounted into the container by running the following command:
```bash
$ docker run --gpus=all --rm -it --cap-add SYS_NICE -u $(id -u):$(id -g) -v $(pwd):/hugectr -w /hugectr nvcr.io/nvidia/merlin/merlin-training:22.02
$ docker run --gpus=all --rm -it --cap-add SYS_NICE -u $(id -u):$(id -g) -v $(pwd):/hugectr -w /hugectr nvcr.io/nvidia/merlin/merlin-training:22.03
```

### Build the HugeCTR Docker Container on Your Own ###
Expand Down
4 changes: 2 additions & 2 deletions samples/criteo_multi_slots/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,11 +11,11 @@ HugeCTR is available as buildable source code, but the easiest way to install an

1. Pull the HugeCTR NGC Docker by running the following command:
```bash
$ docker pull nvcr.io/nvidia/merlin/merlin-training:22.02
$ docker pull nvcr.io/nvidia/merlin/merlin-training:22.03
```
2. Launch the container in interactive mode with the HugeCTR root directory mounted into the container by running this command:
```bash
$ docker run --gpus=all --rm -it --cap-add SYS_NICE -u $(id -u):$(id -g) -v $(pwd):/hugectr -w /hugectr nvcr.io/nvidia/merlin/merlin-training:22.02
$ docker run --gpus=all --rm -it --cap-add SYS_NICE -u $(id -u):$(id -g) -v $(pwd):/hugectr -w /hugectr nvcr.io/nvidia/merlin/merlin-training:22.03
```

### Build the HugeCTR Docker Container on Your Own ###
Expand Down
4 changes: 2 additions & 2 deletions samples/dcn/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,11 +11,11 @@ HugeCTR is available as buildable source code, but the easiest way to install an

1. Pull the HugeCTR NGC Docker by running the following command:
```bash
$ docker pull nvcr.io/nvidia/merlin/merlin-training:22.02
$ docker pull nvcr.io/nvidia/merlin/merlin-training:22.03
```
2. Launch the container in interactive mode with the HugeCTR root directory mounted into the container by running the following command:
```bash
$ docker run --gpus=all --rm -it --cap-add SYS_NICE -u $(id -u):$(id -g) -v $(pwd):/hugectr -w /hugectr nvcr.io/nvidia/merlin/merlin-training:22.02
$ docker run --gpus=all --rm -it --cap-add SYS_NICE -u $(id -u):$(id -g) -v $(pwd):/hugectr -w /hugectr nvcr.io/nvidia/merlin/merlin-training:22.03
```

### Build the HugeCTR Docker Container on Your Own ###
Expand Down
4 changes: 2 additions & 2 deletions samples/deepfm/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,11 +11,11 @@ HugeCTR is available as buildable source code, but the easiest way to install an

1. Pull the HugeCTR NGC Docker by running the following command:
```bash
$ docker pull nvcr.io/nvidia/merlin/merlin-training:22.02
$ docker pull nvcr.io/nvidia/merlin/merlin-training:22.03
```
2. Launch the container in interactive mode with the HugeCTR root directory mounted into the container by running the following command:
```bash
$ docker run --gpus=all --rm -it --cap-add SYS_NICE -u $(id -u):$(id -g) -v $(pwd):/hugectr -w /hugectr nvcr.io/nvidia/merlin/merlin-training:22.02
$ docker run --gpus=all --rm -it --cap-add SYS_NICE -u $(id -u):$(id -g) -v $(pwd):/hugectr -w /hugectr nvcr.io/nvidia/merlin/merlin-training:22.03
```

### Build the HugeCTR Docker Container on Your Own ###
Expand Down
4 changes: 2 additions & 2 deletions samples/din/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,11 +11,11 @@ HugeCTR is available as buildable source code, but the easiest way to install an

1. Pull the HugeCTR NGC Docker by running the following command:
```bash
$ docker pull nvcr.io/nvidia/merlin/merlin-training:22.02
$ docker pull nvcr.io/nvidia/merlin/merlin-training:22.03
```
2. Launch the container in interactive mode with the HugeCTR root directory mounted into the container by running the following command:
```bash
$ docker run --gpus=all --rm -it --cap-add SYS_NICE -u $(id -u):$(id -g) -v $(pwd):/hugectr -w /hugectr nvcr.io/nvidia/merlin/merlin-training:22.02
$ docker run --gpus=all --rm -it --cap-add SYS_NICE -u $(id -u):$(id -g) -v $(pwd):/hugectr -w /hugectr nvcr.io/nvidia/merlin/merlin-training:22.03
```

### Build the HugeCTR Docker Container on Your Own ###
Expand Down
4 changes: 2 additions & 2 deletions samples/dlrm/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,11 +16,11 @@ HugeCTR is available as buildable source code, but the easiest way to install an

1. Pull the HugeCTR NGC Docker by running the following command:
```bash
$ docker pull nvcr.io/nvidia/merlin/merlin-training:22.02
$ docker pull nvcr.io/nvidia/merlin/merlin-training:22.03
```
2. Launch the container in interactive mode with the HugeCTR root directory mounted into the container by running the following command:
```bash
$ docker run --gpus=all --rm -it --cap-add SYS_NICE -u $(id -u):$(id -g) -v $(pwd):/hugectr -w /hugectr nvcr.io/nvidia/merlin/merlin-training:22.02
$ docker run --gpus=all --rm -it --cap-add SYS_NICE -u $(id -u):$(id -g) -v $(pwd):/hugectr -w /hugectr nvcr.io/nvidia/merlin/merlin-training:22.03
```

### Build the HugeCTR Docker Container on Your Own ###
Expand Down
4 changes: 2 additions & 2 deletions samples/ncf/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,11 +11,11 @@ HugeCTR is available as buildable source code, but the easiest way to install an

1. Pull the HugeCTR NGC Docker by running the following command:
```bash
$ docker pull nvcr.io/nvidia/merlin/merlin-training:22.02
$ docker pull nvcr.io/nvidia/merlin/merlin-training:22.03
```
2. Launch the container in interactive mode with the HugeCTR root directory mounted into the container by running the following command:
```bash
$ docker run --gpus=all --rm -it --cap-add SYS_NICE -u $(id -u):$(id -g) -v $(pwd):/hugectr -w /hugectr nvcr.io/nvidia/merlin/merlin-training:22.02
$ docker run --gpus=all --rm -it --cap-add SYS_NICE -u $(id -u):$(id -g) -v $(pwd):/hugectr -w /hugectr nvcr.io/nvidia/merlin/merlin-training:22.03
```

### Build the HugeCTR Docker Container on Your Own ###
Expand Down
4 changes: 2 additions & 2 deletions samples/wdl/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,11 +11,11 @@ HugeCTR is available as buildable source code, but the easiest way to install an

1. Pull the HugeCTR NGC Docker by running the following command:
```bash
$ docker pull nvcr.io/nvidia/merlin/merlin-training:22.02
$ docker pull nvcr.io/nvidia/merlin/merlin-training:22.03
```
2. Launch the container in interactive mode with the HugeCTR root directory mounted into the container by running the following command:
```bash
$ docker run --gpus=all --rm -it --cap-add SYS_NICE -u $(id -u):$(id -g) -v $(pwd):/hugectr -w /hugectr nvcr.io/nvidia/merlin/merlin-training:22.02
$ docker run --gpus=all --rm -it --cap-add SYS_NICE -u $(id -u):$(id -g) -v $(pwd):/hugectr -w /hugectr nvcr.io/nvidia/merlin/merlin-training:22.03
```

### Build the HugeCTR Docker Container on Your Own ###
Expand Down
2 changes: 1 addition & 1 deletion sparse_operation_kit/ReadMe.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ Due to SOK is compatible with DP training provided by common synchronized traini
There are several ways to install this package. <br>

### *Install this module along with HugeCTR* ###
In the docker image: `nvcr.io/nvidia/merlin/merlin-tensorflow-training:22.02`, SparseOpeationKit is already installed, and you can directrly import this module via:
In the docker image: `nvcr.io/nvidia/merlin/merlin-tensorflow-training:22.03`, SparseOpeationKit is already installed, and you can directrly import this module via:
```python
import sparse_opeation_kit as sok
```
Expand Down
4 changes: 2 additions & 2 deletions sparse_operation_kit/notebooks/ReadMe.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ This directory contains a set of Jupyter Notebook demos for SparseOperationKit.
Before trying the notebooks here, you have to follow [these instructions](../../notebooks/README.md#Quickstart) to prepare the operating environment. Summarized bellow:
+ Pull the NGC Docker
```shell
$ docker pull nvcr.io/nvidia/merlin/merlin-tensorflow-training:22.02
$ docker pull nvcr.io/nvidia/merlin/merlin-tensorflow-training:22.03
```
+ Clone the HugeCTR Repo
```shell
Expand All @@ -14,7 +14,7 @@ $ git clone https://github.com/NVIDIA/HugeCTR hugectr
+ Start the Jupyter Notebook
- Launch the container in interactive mode and mount the HugeCTR root directory into the container for your convenience by running this command:
```shell
$ docker run --runtime=nvidia --rm -it -u $(id -u):$(id -g) -v $(pwd):/hugectr -w /hugectr -p 8888:8888 nvcr.io/nvidia/merlin/merlin-tensorflow-training:22.02
$ docker run --runtime=nvidia --rm -it -u $(id -u):$(id -g) -v $(pwd):/hugectr -w /hugectr -p 8888:8888 nvcr.io/nvidia/merlin/merlin-tensorflow-training:22.03
```
- Start Jupyter using these commands:
```shell
Expand Down
Loading

0 comments on commit d8ae8fa

Please sign in to comment.