Skip to content

Latest commit

 

History

History
 
 

Evaluate performance of ONNX Runtime(Huggingface Question Answering)

ONNX runtime quantization is under active development. please use 1.6.0+ to get more quantization support.

This example load a language translation model and confirm its accuracy and speed based on SQuAD task.

Environment

Please use latest onnx and onnxruntime version.

Prepare dataset

You should download SQuAD dataset from SQuAD dataset link.

Prepare model

Supported model identifier from huggingface.co:

Model Identifier
mrm8488/spanbert-finetuned-squadv1
salti/bert-base-multilingual-cased-finetuned-squad
python export.py --model_name_or_path=mrm8488/spanbert-finetuned-squadv1 \ # or other supported model identifier

Quantization

Dynamic quantize:

bash run_tuning.sh --input_model=/path/to/model \ # model path as *.onnx
                   --output_model=/path/to/model_tune \
                   --config=qa_dynamic.yaml

Benchmark

bash run_benchmark.sh --input_model=/path/to/model \ # model path as *.onnx
                      --config=qa_dynamic.yaml
                      --mode=performance # or accuracy