This repo contains examples of exporting models (LightSeq, Fairseq based, Hugging Face, etc.) to protobuf/hdf5 format, and then use LightSeq for fast inference. For each model, we provide normal float model export, quantized model export (QAT, quantization aware training) and PTQ (post training quantization) model export.
Before doing anything, you need to switch to the current directory:
cd examples/inference/python
We provide the following export examples. All Fairseq based models are trained using the scripts in examples/training/fairseq. The first two LightSeq Transformer models are trained using the scripts in examples/training/custom.
Model | Type | Command | Resource | Description |
---|---|---|---|---|
LightSeq Transformer | Float | python3 export/ls_transformer_export.py -m ckpt_ls_custom.pt | link | Export LightSeq Transformer models to protobuf format. |
LightSeq Transformer + PTQ | Int8 | python3 export/ls_transformer_ptq_export.py -m ckpt_ls_custom.pt | link | Export LightSeq Transformer models to int8 protobuf format using post training quantization. |
Hugging Face BART | Float | python3 export/huggingface/hf_bart_export.py | / | Export Hugging Face BART models to protobuf/hdf5 format. |
Hugging Face BERT | Float | python3 export/huggingface/hf_bert_export.py | / | Export Hugging Face BERT models to hdf5 format. |
Hugging Face + custom Torch layer BERT + QAT | Int8 | python3 export/huggingface/ls_torch_hf_quant_bert_export.py -m ckpt_ls_torch_hf_quant_bert_ner.bin | / | Export Hugging Face BERT training with custom Torch layers to hdf5 format. |
Hugging Face GPT2 | Float | python3 export/huggingface/hf_gpt2_export.py | / | Export Hugging Face GPT2 models to hdf5 format. |
Hugging Face + custom Torch layer GPT2 + QAT | Int8 | python3 export/huggingface/ls_torch_hf_quant_gpt2_export.py -m ckpt_ls_torch_hf_quant_gpt2_ner.bin | / | Export Hugging Face GPT2 training with custom Torch layers to hdf5 format. |
Hugging Face ViT | Float | python3 export/huggingface/hf_vit_export.py | / | Export Hugging Face ViT models to hdf5 format. |
Native Fairseq Transformer | Float | python3 export/fairseq/native_fs_transformer_export.py -m ckpt_native_fairseq_31.06.pt | link | Export native Fairseq Transformer models to protobuf/hdf5 format. |
Native Fairseq Transformer + PTQ | Int8 | python3 export/fairseq/native_fs_transformer_export.py -m ckpt_native_fairseq_31.06.pt | link | Export native Fairseq Transformer models to int8 protobuf format using post training quantization. |
Fairseq + LightSeq Transformer | Float | python3 export/fairseq/ls_fs_transformer_export.py -m ckpt_ls_fairseq_31.17.pt | link | Export Fairseq Transformer models training with LightSeq modules to protobuf/hdf5 format. |
Fairseq + LightSeq Transformer + PTQ | Int8 | python3 export/fairseq/ls_fs_transformer_ptq_export.py -m ckpt_ls_fairseq_31.17.pt | link | Export Fairseq Transformer models training with LightSeq modules to int8 protobuf format using post training quantization. |
Fairseq + custom Torch layer | Float | python3 export/fairseq/ls_torch_fs_transformer_export.py -m ckpt_ls_torch_fairseq_31.16.pt | link | Export Fairseq Transformer models training with custom Torch layers and other LightSeq modules to protobuf format. |
Fairseq + custom Torch layer + PTQ | Int8 | python3 export/fairseq/ls_torch_fs_transformer_ptq_export.py -m ckpt_ls_torch_fairseq_31.16.pt | link | Export Fairseq Transformer models training with custom Torch layers and other LightSeq modules to int8 protobuf format using post training quantization. |
Fairseq + custom Torch layer + QAT | Int8 | python3 export/fairseq/ls_torch_fs_quant_transformer_export.py -m ckpt_ls_torch_fairseq_quant_31.09.pt | link | Export quantized Fairseq Transformer models training with custom Torch layers and other LightSeq modules to int8 protobuf format. |
Native Fairseq MoE Transformer | Float | python3 export/fairseq/native_fs_moe_transformer_export.py | / | Export Fairseq MoE Transformer models to protobuf/hdf5 format. |
- BART
python3 test/ls_bart.py
- BERT
python3 test/ls_bert.py
- GPT2
python3 test/ls_gpt2.py
- ViT
python3 test/ls_vit.py
- Quantized BERT
python3 test/ls_quant_bert.py
- Quantized GPT2
python3 test/ls_quant_gpt.py
After exporting the Fairseq based models to protobuf/hdf5 format using above scripts, we can use the following script for fast LightSeq inference on wmt14 en2de dateset, compatible with fp16 and int8 models:
bash test/ls_fairseq.sh --model ${model_path}