GitHub - lz02k/Sparsebit: A model compression and acceleration toolbox based pytorch.

中文版

News

2023.04.27: 🔥 Pipeline parallelism is supported for alpaca-qlora which enables fine-tuning llama-65b with 8*2080ti within 13 hours.
2023.04.15: 🔥 We release alpaca-qlora which reduce a half model size gpu-memory than alpaca-lora. With alpaca-qlora support, you can use a single 2080ti to instruct fine-tuning llama-7b/13b.
2023.03.20: 🔥 We implemented a GPTQ cuda kernel with groupsize feature and add --single_device_mode to support all quant LLaMAs run in a single GPU(i.e. 2080ti). GPTQ for LLaMA.
2023.03.08: Release a mix-precision quantization method based on GPTQ for LLaMA.
2023.02.23: Release a PTQ example of GPT2 on wikiText2
2022.11.24: Release a QAT example of BEVDet
2022.12.13: Release some examples of BERT.
2022.12.14: Release a QAT example of BEVDepth
2022.12.26: Release a QAT example of BEVDet4D

Introduction

Sparsebit is a toolkit with pruning and quantization capabilities. It is designed to help researchers compress and accelerate neural network models by modifying only a few codes in existing pytorch project.

Quantization

Quantization turns full-precision params into low-bit precision params, which can compress and accelerate the model without changing its structure. This toolkit supports two common quantization paradigms, Post-Training-Quantization and Quantization-Aware-Training, with following features:

Benefiting from the support of torch.fx, Sparsebit operates on a QuantModel, and each operation becomes a QuantModule.
Sparsebit can easily be extended by users to accommodate their own researches. Users can register to extend important objects such as QuantModule, Quantizer and Observer by themselves.
Exporting QDQ-ONNX is supported, which can be loaded and deployed by backends such as TensorRT and OnnxRuntime.

Results

PTQ results on ImageNet-1k: link
PTQ results of Vision Transformer on ImageNet-1k: link
PTQ results of YOLO related works on COCO: link
QAT results on ImageNet-1k: link

Sparse

Sparse is often used in deep learning to refer to operations such as reducing network parameters or network computation. At present, Sparse supported by the toolbox has the following characteristics:

Supports two types of pruning: structured/unstructured;
Supports a variety of operation objects including: weights, activations, model-blocks, model-layers, etc.;
Supports multiple pruning algorithms: L1-norm/L0-norm/Fisher-pruning/Hrank/Slimming...
Users can extend a custom pruning algorithm easily by defining a Sparser
Using ONNX as the export format for the pruned model

Resources

Documentations

Detailed usage and development guidance is located in the document. Refer to: docs

CV-Master

We maintain a public course on quantification at Bilibili, introducing the basics of quantification and our latest work. Interested users can join the course.video
Aiming at better enabling users to understand and apply the knowledge related to model compression, we designed related homework based on Sparsebit. Interested users can complete it by themselves.quantization_homework

Plan to re-implement

Join Us

Welcome to be a member (or an intern) of our team if you are interested in Quantization, Pruning, Distillation, Self-Supervised Learning and Model Deployment.
Submit your resume to: [email protected]

Acknowledgement

Sparsebit was inspired by several open source projects. We are grateful for these excellent projects and list them as follows:

License

Sparsebit is released under the Apache 2.0 license.

Name		Name	Last commit message	Last commit date
Latest commit History 126 Commits
.github/workflows		.github/workflows
ci		ci
docs		docs
examples		examples
large_language_models		large_language_models
sparsebit		sparsebit
.gitignore		.gitignore
.gitmodules		.gitmodules
.readthedocs.yaml		.readthedocs.yaml
LICENSE		LICENSE
README.md		README.md
README_zh-CN.md		README_zh-CN.md
requirements-ci.txt		requirements-ci.txt
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

中文版

News

Introduction

Quantization

Results

Sparse

Resources

Documentations

CV-Master

Plan to re-implement

Join Us

Acknowledgement

License

About

Releases

Packages

Languages

License

lz02k/Sparsebit

Folders and files

Latest commit

History

Repository files navigation

中文版

News

Introduction

Quantization

Results

Sparse

Resources

Documentations

CV-Master

Plan to re-implement

Join Us

Acknowledgement

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages