Pytorch Quantization

Introduction

This repository implements a set of quantization strategies to be applied to supported type of layers.

The code originally started from the Pytorch and ATen implementation of a fused GRU/LSTM, extracted as a CFFI extension and expanded from there.

Building currently requires an appropriate CUDA environment, but execution is supported on CPU as well.

Run python build.py
Add current path to the python path: EXPORT PYTHONPATH=/path/to/pytorch-quantization:PYTHONPATH

The following quantization modes are implemented for weights:

The following quantization modes are implemented for activations:

The following quantized layers are implemented: