Skip to content

Prototype routines for GPU quantization written using PyTorch.

License

Notifications You must be signed in to change notification settings

facebookexperimental/protoquant

Repository files navigation

PROTOQUANT - Dynamic Quantization with Tensor Subclassing

The protoquant package provides dynamic vector-wise quantization and quantized arithmetic using torch.tensor subclassing.

This dynamnic quantization support is directed at a broad range of applications, and currently tested with the PyTorch Transformner API and Better Transformers implementation with a focus on GPU inference.

The focus on testing for Transformer Inference is non-limiting and protoquant is broadly applicable to support broad uses for using dynamic inference with PyTorch.

Installation

You need to clone the repo with recursive submodules.

git clone --recurse-submodules https://github.com/facebookexperimental/protoquant.git

If you forget to, you can always fix this using this trick

Once the repository is cloned, you will NEED to be on a GPU machine, and then pip install -e . works.

If you really want to compile on a CPU machine, see here

License

MIT

About

Prototype routines for GPU quantization written using PyTorch.

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published