Skip to content

Commit

Permalink
umd: open source compiler code
Browse files Browse the repository at this point in the history
Release DLA1.2.0 compiler source code

Signed-off-by: Prashant Gaikwad <[email protected]>
Signed-off-by: Mitch Harwell <[email protected]>
Signed-off-by: Gunjan Mehta <[email protected]>
Signed-off-by: Ken Adams <[email protected]>
Signed-off-by: Arvind M <[email protected]>
  • Loading branch information
prasshantg committed Aug 28, 2019
1 parent 38a6300 commit 1ae4738
Show file tree
Hide file tree
Showing 211 changed files with 369,753 additions and 77 deletions.
7 changes: 7 additions & 0 deletions CompilerFeatures.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,13 @@
||EltWise MAX|&#10004;|Not implemented in SW|
|**LRN**||&#10004;|Not implemented in SW|

### Frameworks support

|Framework &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;|Status &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;|
|---------|-------|
|Caffe|&#10004;|
|ONNX|Future|

### Networks verification report

|Network &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;|Configuration &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;|fp16 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; |int8 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; |
Expand Down
59 changes: 25 additions & 34 deletions LowPrecision.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Low precision support in NVDLA

Use of low precision such 8-bit, 4-bit, or even lower number of bits for inference is one of the optimization methods used in deep learning. NVDLA architecture includes INT8 (8-bit) precision support. It helps to compress the model reducing memory footprint and to improve performance with a small degradation in accuracy. Using INT8 precision for inference requires quantizing pre-trained models from floating point to INT8 and programming converters in NVDLA for scaling/re-scaling tensors.
Use of low precision such 8-bit, 4-bit, or even lower number of bits for inference is one of the optimization methods used in deep learning. It helps to compress the model reducing memory footprint and to improve performance with a small degradation in accuracy. Using INT8 precision for inference requires quantizing pre-trained models from floating point to INT8 and programming converters in NVDLA for scaling/re-scaling tensors.

### NVDLA architecture for INT8 precision support includes the following:
- INT8 input/output data read/write
Expand All @@ -9,14 +9,24 @@ Use of low precision such 8-bit, 4-bit, or even lower number of bits for inferen
- Per-tensor and per-kernel output re-scaling using output converters

### Steps to generate INT8 quantized model:
- Analyze the dynamic range of per-layer tensors and calculate scale factors
- Analyze the dynamic range of per-layer tensors and calculate scale factors using TensorRT
- Import scale factors generated using TensorRT to NVDLA JSON format
- Quantize model weights and determine the converter parameters using scale factors

#### Analyze dynamic range of per-layer tensors and calculate scale factors
A calibration tool can collect the dynamic range of the output tensor for each layer over a dataset of images. This dynamic range information can be used to calculate per-tensor scale factors. The NVDLA Compiler uses the following JSON schema to import scale factors.
#### Analyze dynamic range of per-layer tensors and calculate scale factors using TensorRT
A calibration tool collects the dynamic range of the output tensor for each layer over a dataset of images. This dynamic range information can be used to calculate per-tensor scale factors. For NVDLA, calibration interface TensorRT is used to generate scale factors.

Refer to https://github.com/NVIDIA/TensorRT/tree/release/5.1/samples/opensource/sampleINT8 for sample application which explains how to use TensorRT to generate scales factors.

Notes:
- Use IInt8EntropyCalibrator2 for calibration.
- Dump calibration scales using writeCalibrationCache() to import it in NVDLA JSON format.
- Do not set --useDLACore for calibration, it is used to generate runtime engine through TensorRT for NVIDIA Xavier platform such NVIDIA Jetson AGX Xavier which has NVDLA integrated.

##### JSON schema for calibration table

The NVDLA Compiler uses the following JSON schema to import scale factors generated from TensorRT.

```
{
"type" : "object",
Expand Down Expand Up @@ -45,37 +55,18 @@ A calibration tool can collect the dynamic range of the output tensor for each l
}
```

##### Sample calibration table for first few layers of ResNet-50 using symmetric scaling
##### How to covert calibration cache dump to NVDLA JSON format?

```
{
"data" : {
"scale": 0.00781453,
"min": 0,
"max": 0,
"offset": 0
},
"conv1" : {
"scale": 0.0891214,
"min": 0,
"max": 0,
"offset": 0
},
"pool1" : {
"scale": 0.0891214,
"min": 0,
"max": 0,
"offset": 0
},
"res2a_branch1" : {
"scale": 0.119546,
"min": 0,
"max": 0,
"offset": 0
}
}
```
[calib_txt_to_json.py](https://github.com/nvdla/sw/tree/master/umd/utils/calibdata/calib_txt_to_json.py) can be used to convert calibration scales generated from TensorRT to NVDLA JSON format.

#### Quantize model weights and determine the converter parameters

The NVDLA Compiler has the ability to quantize model weights and determine the converter parameters using the scale factors from the calibration table.
The NVDLA Compiler has the ability to quantize model weights and determine the converter parameters using the scale factors from the calibration table.

Use --calibtable argument to use calibration table generated from TensorRT as input to NVDLA compiler.

#### Example

Sample calibration table for [ResNet-50 Caffe model](https://github.com/KaimingHe/deep-residual-networks) is shared at [calib.json](https://github.com/nvdla/sw/tree/master/umd/utils/calibdata/calib.json)

This calibration table can be used with NVDLA compiler and [ResNet-50 Caffe model](https://github.com/KaimingHe/deep-residual-networks) to run ResNet-50 on NVDLA INT8 configuration
25 changes: 25 additions & 0 deletions Roadmap.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
# NVDLA Roadmap

### DLA 1.3.0

- HW Multibatch for FC layers
- Multi-input network support
- Support different precision and format for input tensors
- Buffer pre-registration
- INT8 deconvolution
- Deconvolution optmization
- Support deconvolution with stride > 32
- INT8 group convolution
- Depthwise convolution optmization
- ReLU-N
- Machine Translation Layer (MTL)

Note: APIs are expected to change in DLA1.3.0

### Future

- Memory optimzations
- ONNX
- Sample application for accuracy
- Sample application for object detection

18 changes: 13 additions & 5 deletions umd/Makefile
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright (c) 2017, NVIDIA CORPORATION. All rights reserved.
# Copyright (c) 2017-2019, NVIDIA CORPORATION. All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
Expand All @@ -25,11 +25,19 @@
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.


SUBDIRS = core/runtime \
tests/runtime
COMPILER_SUBDIRS = core/src/compiler \
apps/compiler

subdirs:
for dir in $(SUBDIRS); do \
RUNTIME_SUBDIRS = core/src/runtime \
apps/runtime

compiler:
for dir in $(COMPILER_SUBDIRS); do \
$(MAKE) -C $$dir; \
done

runtime:
for dir in $(RUNTIME_SUBDIRS); do \
$(MAKE) -C $$dir; \
done

Expand Down
121 changes: 121 additions & 0 deletions umd/apps/compiler/CompileTest.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,121 @@
/*
* Copyright (c) 2017-2019, NVIDIA CORPORATION. All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
* * Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
* * Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the distribution.
* * Neither the name of NVIDIA CORPORATION nor the names of its
* contributors may be used to endorse or promote products derived
* from this software without specific prior written permission.
*
* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS ``AS IS'' AND ANY
* EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
* PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR
* CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
* EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
* PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
* PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
* OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
* (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
* OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/

#include "main.h"

#include "nvdla/IProfile.h"
#include "nvdla/IProfiler.h"
#include "nvdla/IWisdom.h"
#include "nvdla/INetwork.h"
#include "nvdla/ICompiler.h"
#include "nvdla/ITargetConfig.h"

#include "ErrorMacros.h"
#include "nvdla_os_inf.h"

NvDlaError compileProfile(const TestAppArgs* appArgs, TestInfo* i)
{
NvDlaError e = NvDlaSuccess;
std::string profileName = "";
std::string targetConfigName = "";

NvDlaFileHandle file = 0;
std::string fileName = "";
NvU8 *buffer = 0;
NvU64 size = 0;

nvdla::ICompiler* compiler = i->wisdom->getCompiler();
if (!compiler)
ORIGINATE_ERROR_FAIL(NvDlaError_BadParameter, "wisdom->getCompiler() failed");

if (!(appArgs->configtarget != ""))
ORIGINATE_ERROR_FAIL(NvDlaError_NotInitialized, "No target config found to load");

targetConfigName = appArgs->configtarget;

// Determine profile
PROPAGATE_ERROR_FAIL(generateProfile(appArgs, &profileName, i));

// Compile
NvDlaDebugPrintf("compiling profile \"%s\"... config \"%s\"...\n", profileName.c_str(), targetConfigName.c_str());
PROPAGATE_ERROR_FAIL(compiler->compile(profileName.c_str(), targetConfigName.c_str(), &i->compiledLoadable));

// Get loadable buffer and dump it into a file
PROPAGATE_ERROR_FAIL(compiler->getLoadableImageSize(profileName.c_str(),
&size));
if (size == 0) {
ORIGINATE_ERROR_FAIL(NvDlaError_BadParameter,
"Invalid size for a loadable");
}

buffer = (NvU8 *) NvDlaAlloc(size);
if (buffer == NULL) {
ORIGINATE_ERROR_FAIL(NvDlaError_InsufficientMemory,
"Failed to allocate buffer for loadable");
}
PROPAGATE_ERROR_FAIL(compiler->getLoadableImage(profileName.c_str(),
buffer));
fileName = profileName + ".nvdla";
PROPAGATE_ERROR_FAIL(NvDlaFopen(fileName.c_str(), NVDLA_OPEN_WRITE, &file));
PROPAGATE_ERROR_FAIL(NvDlaFwrite(file, buffer, size));

fail:
NvDlaFclose(file);
if (buffer != NULL)
NvDlaFree(buffer);
return e;
}

NvDlaError compile(const TestAppArgs* appArgs, TestInfo* i)
{
NvDlaError e = NvDlaSuccess;

i->compiledLoadable = 0;

NvDlaDebugPrintf("creating new wisdom context...\n");
i->wisdom = nvdla::createWisdom();
if (!i->wisdom)
ORIGINATE_ERROR_FAIL(NvDlaError_BadParameter, "createWisdom() failed");

NvDlaDebugPrintf("opening wisdom context...\n");
if (!i->wisdom->open(i->wisdomPath))
ORIGINATE_ERROR_FAIL(NvDlaError_BadParameter, "wisdom->open() failed to open: \"%s\"", i->wisdomPath.c_str());

// Compile
PROPAGATE_ERROR_FAIL(compileProfile(appArgs, i));

NvDlaDebugPrintf("closing wisdom context...\n");
i->wisdom->close();

fail:
if (i->wisdom != NULL) {
nvdla::destroyWisdom(i->wisdom);
i->wisdom = NULL;
}
return e;
}
Loading

0 comments on commit 1ae4738

Please sign in to comment.