Skip to content

Latest commit

 

History

History
62 lines (57 loc) · 9.34 KB

README.md

File metadata and controls

62 lines (57 loc) · 9.34 KB

Model export and LightSeq inference

This repo contains examples of exporting models (LightSeq, Fairseq based, Hugging Face, etc.) to protobuf/hdf5 format, and then use LightSeq for fast inference. For each model, we provide normal float model export, quantized model export (QAT, quantization aware training) and PTQ (post training quantization) model export.

Before doing anything, you need to switch to the current directory:

cd examples/inference/python

Model export

We provide the following export examples. All Fairseq based models are trained using the scripts in examples/training/fairseq. The first two LightSeq Transformer models are trained using the scripts in examples/training/custom.

Model Type Command Resource Description
LightSeq Transformer Float python3 export/ls_transformer_export.py -m ckpt_ls_custom.pt link Export LightSeq Transformer models to protobuf format.
LightSeq Transformer + PTQ Int8 python3 export/ls_transformer_ptq_export.py -m ckpt_ls_custom.pt link Export LightSeq Transformer models to int8 protobuf format using post training quantization.
Hugging Face BART Float python3 export/huggingface/hf_bart_export.py / Export Hugging Face BART models to protobuf/hdf5 format.
Hugging Face BERT Float python3 export/huggingface/hf_bert_export.py / Export Hugging Face BERT models to hdf5 format.
Hugging Face + custom Torch layer BERT + QAT Int8 python3 export/huggingface/ls_torch_hf_quant_bert_export.py -m ckpt_ls_torch_hf_quant_bert_ner.bin / Export Hugging Face BERT training with custom Torch layers to hdf5 format.
Hugging Face GPT2 Float python3 export/huggingface/hf_gpt2_export.py / Export Hugging Face GPT2 models to hdf5 format.
Hugging Face + custom Torch layer GPT2 + QAT Int8 python3 export/huggingface/ls_torch_hf_quant_gpt2_export.py -m ckpt_ls_torch_hf_quant_gpt2_ner.bin / Export Hugging Face GPT2 training with custom Torch layers to hdf5 format.
Hugging Face ViT Float python3 export/huggingface/hf_vit_export.py / Export Hugging Face ViT models to hdf5 format.
Native Fairseq Transformer Float python3 export/fairseq/native_fs_transformer_export.py -m ckpt_native_fairseq_31.06.pt link Export native Fairseq Transformer models to protobuf/hdf5 format.
Native Fairseq Transformer + PTQ Int8 python3 export/fairseq/native_fs_transformer_export.py -m ckpt_native_fairseq_31.06.pt link Export native Fairseq Transformer models to int8 protobuf format using post training quantization.
Fairseq + LightSeq Transformer Float python3 export/fairseq/ls_fs_transformer_export.py -m ckpt_ls_fairseq_31.17.pt link Export Fairseq Transformer models training with LightSeq modules to protobuf/hdf5 format.
Fairseq + LightSeq Transformer + PTQ Int8 python3 export/fairseq/ls_fs_transformer_ptq_export.py -m ckpt_ls_fairseq_31.17.pt link Export Fairseq Transformer models training with LightSeq modules to int8 protobuf format using post training quantization.
Fairseq + custom Torch layer Float python3 export/fairseq/ls_torch_fs_transformer_export.py -m ckpt_ls_torch_fairseq_31.16.pt link Export Fairseq Transformer models training with custom Torch layers and other LightSeq modules to protobuf format.
Fairseq + custom Torch layer + PTQ Int8 python3 export/fairseq/ls_torch_fs_transformer_ptq_export.py -m ckpt_ls_torch_fairseq_31.16.pt link Export Fairseq Transformer models training with custom Torch layers and other LightSeq modules to int8 protobuf format using post training quantization.
Fairseq + custom Torch layer + QAT Int8 python3 export/fairseq/ls_torch_fs_quant_transformer_export.py -m ckpt_ls_torch_fairseq_quant_31.09.pt link Export quantized Fairseq Transformer models training with custom Torch layers and other LightSeq modules to int8 protobuf format.
Native Fairseq MoE Transformer Float python3 export/fairseq/native_fs_moe_transformer_export.py / Export Fairseq MoE Transformer models to protobuf/hdf5 format.

LightSeq inference

Hugging Face models

  1. BART
python3 test/ls_bart.py
  1. BERT
python3 test/ls_bert.py
  1. GPT2
python3 test/ls_gpt2.py
  1. ViT
python3 test/ls_vit.py
  1. Quantized BERT
python3 test/ls_quant_bert.py
  1. Quantized GPT2
python3 test/ls_quant_gpt.py

Fairseq based models

After exporting the Fairseq based models to protobuf/hdf5 format using above scripts, we can use the following script for fast LightSeq inference on wmt14 en2de dateset, compatible with fp16 and int8 models:

bash test/ls_fairseq.sh --model ${model_path}