Skip to content

Commit 0ed9e7b

Browse files
authored
Migrated torchrec_dlrm inference models_v2 format (#2341)
Signed-off-by: Minh1 Le<[email protected]> Signed-off-by: Mahathi Vatsal <[email protected]>
1 parent 1468ea3 commit 0ed9e7b

30 files changed

+36290
-3073
lines changed

README.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -109,7 +109,7 @@ For best performance on Intel® Data Center GPU Flex and Max Series, please chec
109109
| [Wide & Deep](https://arxiv.org/pdf/1606.07792.pdf) | TensorFlow | Inference | [FP32](/benchmarks/recommendation/tensorflow/wide_deep/inference/README.md) | [Census Income dataset](https://github.com/IntelAI/models/tree/master/benchmarks/recommendation/tensorflow/wide_deep/inference/fp32#dataset) |
110110
| [DLRM](https://arxiv.org/pdf/1906.00091.pdf) | PyTorch | Inference | [FP32 Int8 BFloat16 BFloat32](/models_v2/pytorch/dlrm/inference/cpu/README.md) | [Criteo Terabyte](/models_v2/pytorch/dlrm/inference/cpu/README.md#datasets) |
111111
| [DLRM](https://arxiv.org/pdf/1906.00091.pdf) | PyTorch | Training | [FP32 BFloat16 BFloat32](/models_v2/pytorch/dlrm/training/cpu/README.md) | [Criteo Terabyte](/models_v2/pytorch/dlrm/training/cpu/README.md#datasets) |
112-
| [DLRM v2](https://arxiv.org/pdf/1906.00091.pdf) | PyTorch | Inference | [FP32 FP16 BFloat16 BFloat32 Int8](/quickstart/recommendation/pytorch/torchrec_dlrm/inference/cpu/README.md) | [Criteo 1TB Click Logs dataset](/quickstart/recommendation/pytorch/torchrec_dlrm/inference/cpu#datasets) |
112+
| [DLRM v2](https://arxiv.org/pdf/1906.00091.pdf) | PyTorch | Inference | [FP32 FP16 BFloat16 BFloat32 Int8](/models_v2/pytorch/torchrec_dlrm/inference/cpu/README.md) | [Criteo 1TB Click Logs dataset](/quickstart/recommendation/pytorch/torchrec_dlrm/inference/cpu#datasets) |
113113

114114
### Diffusion
115115

Original file line numberDiff line numberDiff line change
@@ -0,0 +1,110 @@
1+
# DLRM v2 Inference
2+
3+
DLRM v2 Inference best known configurations with Intel® Extension for PyTorch.
4+
5+
## Model Information
6+
7+
| **Use Case** | **Framework** | **Model Repo** | **Branch/Commit/Tag** | **Optional Patch** |
8+
|:---:| :---: |:--------------:|:---------------------:|:------------------:|
9+
| Inference | PyTorch | https://github.com/facebookresearch/dlrm/tree/main/torchrec_dlrm | - | - |
10+
11+
# Pre-Requisite
12+
## Bare Metal
13+
### General setup
14+
15+
Follow [link](https://github.com/IntelAI/models/blob/master/docs/general/pytorch/BareMetalSetup.md) to build Pytorch, IPEX, TorchVison and TCMalloc.
16+
17+
### Model Specific Setup
18+
19+
* Installation of [Build PyTorch + IPEX + TorchVision Jemalloc and TCMalloc](https://github.com/IntelAI/models/blob/master/docs/general/pytorch/BareMetalSetup.md)
20+
* Installation of [oneccl-bind-pt](https://pytorch-extension.intel.com/release-whl/stable/cpu/us/oneccl-bind-pt/) (if running distributed)
21+
* Set Jemalloc and tcmalloc Preload for better performance
22+
23+
The jemalloc and tcmalloc should be built from the [General setup](#general-setup) section.
24+
```
25+
export LD_PRELOAD="<path to the jemalloc directory>/lib/libjemalloc.so":"path_to/tcmalloc/lib/libtcmalloc.so":$LD_PRELOAD
26+
export MALLOC_CONF="oversize_threshold:1,background_thread:true,metadata_thp:auto,dirty_decay_ms:9000000000,muzzy_decay_ms:9000000000"
27+
```
28+
* Set IOMP preload for better performance
29+
```
30+
pip install packaging intel-openmp
31+
export LD_PRELOAD=path/lib/libiomp5.so:$LD_PRELOAD
32+
```
33+
34+
* Set ENV to use AMX if you are using SPR
35+
```bash
36+
export DNNL_MAX_CPU_ISA=AVX512_CORE_AMX
37+
```
38+
* Set ENV to use fp16 AMX if you are using a supported platform
39+
```
40+
export DNNL_MAX_CPU_ISA=AVX512_CORE_AMX_FP16
41+
```
42+
43+
## Datasets
44+
The dataset can be downloaded and preprocessed by following https://github.com/mlcommons/training/tree/master/recommendation_v2/torchrec_dlrm#create-the-synthetic-multi-hot-dataset.
45+
We also provided a preprocessed scripts based on the instruction above. `preprocess_raw_dataset.sh`.
46+
After you loading the raw dataset `day_*.gz` and unzip them to RAW_DIR.
47+
```bash
48+
cd <AI Reference Models>/models_v2/pytorch/torchrec_dlrm/inference/cpu
49+
export MODEL_DIR=$(pwd)
50+
export RAW_DIR=<the unziped raw dataset>
51+
export TEMP_DIR=<where your choose the put the temp file during preprocess>
52+
export PREPROCESSED_DIR=<where your choose the put the one-hot dataset>
53+
export MULTI_HOT_DIR=<where your choose the put the multi-hot dataset>
54+
bash preprocess_raw_dataset.sh
55+
```
56+
57+
## Pre-Trained checkpoint
58+
You can download and unzip checkpoint by following
59+
https://github.com/mlcommons/inference/tree/master/recommendation/dlrm_v2/pytorch#downloading-model-weights
60+
61+
## Inference
62+
1. `git clone https://github.com/IntelAI/models.git`
63+
2. `cd models/models_v2/pytorch/torchrec_dlrm/inference/cpu`
64+
3. Create virtual environment `venv` and activate it:
65+
```
66+
python3 -m venv venv
67+
. ./venv/bin/activate
68+
```
69+
4. Install general model requirements
70+
```
71+
./setup.sh
72+
```
73+
5. Install the latest CPU versions of [torch, torchvision and intel_extension_for_pytorch](https://intel.github.io/intel-extension-for-pytorch/index.html#installation).
74+
75+
6. Setup required environment paramaters
76+
77+
| **Parameter** | **export command** |
78+
|:---------------------------:|:------------------------------------------------------------------------------------:|
79+
| **TEST_MODE** (THROUGHPUT, ACCURACY) | `export TEST_MODE=THROUGHPUT` |
80+
| **DATASET_DIR** | `export DATASET_DIR=<multi-hot dataset dir>` |
81+
| **WEIGHT_DIR** (ONLY FOR ACCURACY) | `export WEIGHT_DIR=<offical released checkpoint>` |
82+
| **PRECISION** | `export PRECISION=int8 <specify the precision to run: int8, fp32, bf32 or bf16>` |
83+
| **OUTPUT_DIR** | `export OUTPUT_DIR=$PWD` |
84+
| **BATCH_SIZE** (optional) | `export BATCH_SIZE=10000` |
85+
7. Run `run_model.sh`
86+
## Output
87+
88+
Single-tile output will typically look like:
89+
90+
```
91+
accuracy 76.215 %, best 76.215 %
92+
dlrm_inf latency: 0.11193203926086426 s
93+
dlrm_inf avg time: 0.007462135950724284 s, ant the time count is : 15
94+
dlrm_inf throughput: 4391235.996821996 samples/s
95+
```
96+
97+
98+
Final results of the inference run can be found in `results.yaml` file.
99+
```
100+
results:
101+
- key: throughput
102+
value: 4391236.0
103+
unit: inst/s
104+
- key: latency
105+
value: 0.007462135950724283
106+
unit: s
107+
- key: accuracy
108+
value: 76.215
109+
unit: accuracy
110+
```

models_v2/pytorch/torchrec_dlrm/inference/cpu/__init__.py

Whitespace-only changes.

models_v2/pytorch/torchrec_dlrm/inference/cpu/data_process/__init__.py

Whitespace-only changes.

models_v2/pytorch/torchrec_dlrm/training/gpu/data/dlrm_dataloader.py renamed to models_v2/pytorch/torchrec_dlrm/inference/cpu/data_process/dlrm_dataloader.py

+15-43
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,6 @@
1+
#
2+
# -*- coding: utf-8 -*-
3+
#
14
# Copyright (c) 2023 Intel Corporation
25
#
36
# Licensed under the Apache License, Version 2.0 (the "License");
@@ -12,6 +15,7 @@
1215
# See the License for the specific language governing permissions and
1316
# limitations under the License.
1417
#
18+
1519
#!/usr/bin/env python3
1620
# Copyright (c) Meta Platforms, Inc. and affiliates.
1721
#
@@ -31,8 +35,6 @@
3135
DEFAULT_INT_NAMES,
3236
InMemoryBinaryCriteoIterDataPipe,
3337
)
34-
# This is for crop dataset
35-
DAYS_MIN=1
3638
from torchrec.datasets.random import RandomRecDataset
3739

3840
# OSS import
@@ -96,44 +98,18 @@ def _get_in_memory_dataloader(
9698
sparse_part = "sparse_multi_hot.npz"
9799
datapipe = MultiHotCriteoIterDataPipe
98100

99-
if args.dataset_name == "criteo_kaggle":
100-
# criteo_kaggle has no validation set, so use 2nd half of training set for now.
101-
# Setting stage to "test" will get the 2nd half of the dataset.
102-
# Setting root_name to "train" reads from the training set file.
103-
(root_name, stage) = ("train", "test") if stage == "val" else stage
101+
if stage == "train":
104102
stage_files: List[List[str]] = [
105-
[os.path.join(dir_path, f"{root_name}_dense.npy")],
106-
[os.path.join(dir_path, f"{root_name}_{sparse_part}")],
107-
[os.path.join(dir_path, f"{root_name}_labels.npy")],
103+
[os.path.join(dir_path, f"day_{i}_dense.npy") for i in range(DAYS - 1)],
104+
[os.path.join(dir_path, f"day_{i}_{sparse_part}") for i in range(DAYS - 1)],
105+
[os.path.join(dir_path, f"day_{i}_labels.npy") for i in range(DAYS - 1)],
108106
]
109-
# criteo_1tb code path uses below two conditionals
110-
elif stage == "train":
111-
if args.converge:
112-
stage_files: List[List[str]] = [
113-
[os.path.join(dir_path, f"day_{i}_dense.npy") for i in range(DAYS - 1)],
114-
[os.path.join(dir_path, "multihot", f"day_{i}_{sparse_part}") for i in range(DAYS - 1)],
115-
[os.path.join(dir_path, f"day_{i}_labels.npy") for i in range(DAYS - 1)],
116-
]
117-
else:
118-
stage_files: List[List[str]] = [
119-
# for crop dataset
120-
[os.path.join(dir_path, f"day_{i}_dense.npy") for i in range(DAYS_MIN)],
121-
[os.path.join(dir_path, f"day_{i}_{sparse_part}") for i in range(DAYS_MIN)],
122-
[os.path.join(dir_path, f"day_{i}_labels.npy") for i in range(DAYS_MIN)],
123-
]
124107
elif stage in ["val", "test"]:
125-
if args.converge:
126-
stage_files: List[List[str]] = [
127-
[os.path.join(dir_path, f"day_{DAYS-1}_dense.npy")],
128-
[os.path.join(dir_path, "multihot", f"day_{DAYS-1}_{sparse_part}")],
129-
[os.path.join(dir_path, f"day_{DAYS-1}_labels.npy")],
130-
]
131-
else:
132-
stage_files: List[List[str]] = [
133-
[os.path.join(dir_path, f"day_{DAYS_MIN-1}_dense.npy")],
134-
[os.path.join(dir_path, f"day_{DAYS_MIN-1}_{sparse_part}")],
135-
[os.path.join(dir_path, f"day_{DAYS_MIN-1}_labels.npy")],
136-
]
108+
stage_files: List[List[str]] = [
109+
[os.path.join(dir_path, f"day_{DAYS-1}_dense.npy")],
110+
[os.path.join(dir_path, f"day_{DAYS-1}_{sparse_part}")],
111+
[os.path.join(dir_path, f"day_{DAYS-1}_labels.npy")],
112+
]
137113
if stage in ["val", "test"] and args.test_batch_size is not None:
138114
batch_size = args.test_batch_size
139115
else:
@@ -143,11 +119,8 @@ def _get_in_memory_dataloader(
143119
stage,
144120
*stage_files, # pyre-ignore[6]
145121
batch_size=batch_size,
146-
#rank=dist.get_rank(),
147-
#world_size=dist.get_world_size(),
148-
# The rand and world_size set for custom dist-dlrm
149-
rank=0,
150-
world_size=1,
122+
rank=0, # dist.get_rank(),
123+
world_size=1, # dist.get_world_size(),
151124
drop_last=args.drop_last_training_batch if stage == "train" else False,
152125
shuffle_batches=args.shuffle_batches,
153126
shuffle_training_set=args.shuffle_training_set,
@@ -158,7 +131,6 @@ def _get_in_memory_dataloader(
158131
else ([args.num_embeddings] * CAT_FEATURE_COUNT),
159132
),
160133
batch_size=None,
161-
num_workers=0,
162134
pin_memory=args.pin_memory,
163135
collate_fn=lambda x: x,
164136
)

models_v2/pytorch/torchrec_dlrm/training/gpu/data/multi_hot_criteo.py renamed to models_v2/pytorch/torchrec_dlrm/inference/cpu/data_process/multi_hot_criteo.py

+27-1
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,21 @@
1+
#
2+
# -*- coding: utf-8 -*-
3+
#
4+
# Copyright (c) 2023 Intel Corporation
5+
#
6+
# Licensed under the Apache License, Version 2.0 (the "License");
7+
# you may not use this file except in compliance with the License.
8+
# You may obtain a copy of the License at
9+
#
10+
# http://www.apache.org/licenses/LICENSE-2.0
11+
#
12+
# Unless required by applicable law or agreed to in writing, software
13+
# distributed under the License is distributed on an "AS IS" BASIS,
14+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
15+
# See the License for the specific language governing permissions and
16+
# limitations under the License.
17+
#
18+
119
#!/usr/bin/env python3
220
# Copyright (c) Meta Platforms, Inc. and affiliates.
321
#
@@ -214,7 +232,7 @@ def _np_arrays_to_batch(
214232
offset_per_key = torch.cumsum(
215233
torch.concat((torch.tensor([0]), torch.tensor(length_per_key))), dim=0
216234
)
217-
values = torch.concat([torch.from_numpy(feat).flatten() for feat in sparse])
235+
values = torch.concat([torch.from_numpy(feat.copy()).flatten() for feat in sparse])
218236
return Batch(
219237
dense_features=torch.from_numpy(dense.copy()),
220238
sparse_features=KeyedJaggedTensor(
@@ -308,3 +326,11 @@ def append_to_buffer(
308326

309327
def __len__(self) -> int:
310328
return self.num_full_batches // self.world_size + (self.last_batch_sizes[0] > 0)
329+
330+
def load_batch(self, sample_list=None) -> Batch:
331+
if sample_list is None:
332+
sample_list = list(range(self.batch_size))
333+
dense = self.dense_arrs[0][sample_list, :]
334+
sparse = [arr[sample_list, :] % self.hashes[i] for i, arr in enumerate(self.sparse_arrs[0])]
335+
labels = self.labels_arrs[0][sample_list, :]
336+
return self._np_arrays_to_batch(dense, sparse, labels)

0 commit comments

Comments
 (0)