Skip to content

Commit

Permalink
doc: add compiler features and LP support doc
Browse files Browse the repository at this point in the history
Signed-off-by: Prashant Gaikwad <[email protected]>
  • Loading branch information
prasshantg committed May 28, 2019
1 parent 9860451 commit 9d389fb
Show file tree
Hide file tree
Showing 2 changed files with 133 additions and 0 deletions.
52 changes: 52 additions & 0 deletions CompilerFeatures.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
# DLA Compiler

### Layers and features support

|Layer &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;|Feature &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;|FP16 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;|INT8 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;|
|-----------|---------------|-------|-------|
|**Convolution**||&#10004;|&#10004;|
||Dilation|&#10004;|&#10004;|
||Winograd|&#10004;|Not implemented in SW|
|**Deconvolution**||&#10004;|&#10004;|
||With padding|Not implemented in SW|Not implemented in SW|
||Winograd|Not implemented in SW|Not implemented in SW|
|**Fully Connected**||&#10004;|&#10004;|
||Winograd|Not implemented in SW|Not implemented in SW|
|**Group Convolution**||&#10004;|Not implemented in SW|
||Winograd|&#10004;|Not implemented in SW|
|**Pooling**||&#10004;|&#10004;|
||Max|&#10004;|&#10004;|
||Min|&#10004;|&#10004;|
||Avg|&#10004;|&#10004;|
||Inclusive padding|&#10004;|&#10004;|
||Exclusive padding|Not supported in HW| Not supported in HW|
|**Activation**||||
||Bias|&#10004;|&#10004;|
||BatchNorm|&#10004;|&#10004;|
||Scale|&#10004;|&#10004;|
||Sigmoid|&#10004;|Not implemented in SW|
||Tanh|&#10004;|Not implemented in SW|
||EltWise SUM|&#10004;|&#10004;|
||EltWise SUB|Not supported in HW|Not supported in HW|
||EltWise MIN|&#10004;|Not implemented in SW|
||EltWise MAX|&#10004;|Not implemented in SW|
|**LRN**||&#10004;|Not implemented in SW|

### Networks verification report

|Network &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;|Configuration &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;|fp16 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; |int8 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; |
|-------|----|----|----|
|MNIST|nv_full,nv_large,nv_small|Verified|Verified|
|ResNet-18|nv_full,nv_large,nv_small|Verified|Verified|
|ResNet-50|nv_full,nv_large,nv_small|Verified|Verified|

### Known limitations
- Not supported in HW
- Dilation with Winograd
- EltWise SUB
- Pooling and convolution layers where pad size is greater than kernel size
- Not implemented in SW
- Deconvolution with strides > 32
- Deconvolution with input/output padding


81 changes: 81 additions & 0 deletions LowPrecision.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
# Low precision support in NVDLA

Use of low precision such 8-bit, 4-bit, or even lower number of bits for inference is one of the optimization methods used in deep learning. NVDLA architecture includes INT8 (8-bit) precision support. It helps to compress the model reducing memory footprint and to improve performance with a small degradation in accuracy. Using INT8 precision for inference requires quantizing pre-trained models from floating point to INT8 and programming converters in NVDLA for scaling/re-scaling tensors.

### NVDLA architecture for INT8 precision support includes the following:
- INT8 input/output data read/write
- 32-bit internal pipeline, avoids saturation in mathematical computations
- Per-tensor input scaling using input converters
- Per-tensor and per-kernel output re-scaling using output converters

### Steps to generate INT8 quantized model:
- Analyze the dynamic range of per-layer tensors and calculate scale factors
- Quantize model weights and determine the converter parameters using scale factors

#### Analyze dynamic range of per-layer tensors and calculate scale factors
A calibration tool can collect the dynamic range of the output tensor for each layer over a dataset of images. This dynamic range information can be used to calculate per-tensor scale factors. The NVDLA Compiler uses the following JSON schema to import scale factors.

##### JSON schema for calibration table

```
{
"type" : "object",
"description": "JSON schema for calibration table",
"layer" : {
"type": "array",
"description": "per-layer scale factor for output tensor, scale factor can be described using either scale or min/max",
"oneOf": ["scale", {"min", "max"}],
"scale": {
"type": "float",
"description": "scale value calibrated for output tensor of layer"
},
"min": {
"type": float",
"description": "minimum value of the source precision dynamic range for output tensor of layer"
},
"max": {
"type": "float",
"description": "maximum value of the source precision dynamic range for output tensor of layer"
},
"offset": {
"type" : "integer",
"description": "offset used for asymmetric scaling, it should be 0 for symmetric scaling"
}
}
}
```

##### Sample calibration table for first few layers of ResNet-50 using symmetric scaling

```
{
"data" : {
"scale": 0.00781453,
"min": 0,
"max": 0,
"offset": 0
},
"conv1" : {
"scale": 0.0891214,
"min": 0,
"max": 0,
"offset": 0
},
"pool1" : {
"scale": 0.0891214,
"min": 0,
"max": 0,
"offset": 0
},
"res2a_branch1" : {
"scale": 0.119546,
"min": 0,
"max": 0,
"offset": 0
}
}
```

#### Quantize model weights and determine the converter parameters

The NVDLA Compiler has the ability to quantize model weights and determine the converter parameters using the scale factors from the calibration table.

0 comments on commit 9d389fb

Please sign in to comment.