Skip to content

Latest commit

 

History

History
 
 

perftest

ONNXRuntime Performance Test

This tool provides the performance results using the ONNX Runtime with the specific execution provider to run the inference for a given model using the sample input test data. This tool can provide a reliable measurement for the inference latency usign ONNX Runtime on the device. The options to use with the tool are listed below:

onnxruntime_perf_test [options...] model_path result_file

Options:

-A: Disable memory arena.

-M: Disable memory pattern.

-P: Use parallel executor instead of sequential executor.

-c: [parallel runs]: Specifies the (max) number of runs to invoke simultaneously. Default:1.

-e: [cpu|cuda|mkldnn|tensorrt|ngraph|openvino|nuphar|acl]: Specifies the execution provider 'cpu','cuda','dnnn','tensorrt', 'ngraph', 'openvino', 'nuphar' or 'acl'. Default is 'cpu'.
    
-m: [test_mode]: Specifies the test mode. Value coulde be 'duration' or 'times'. Provide 'duration' to run the test for a fix duration, and 'times' to repeated for a certain times. Default:'duration'.
    
-o: [optimization level]: Default is 1. Valid values are 0 (disable), 1 (basic), 2 (extended), 99 (all). Please see __onnxruntime_c_api.h__ (enum GraphOptimizationLevel) for the full list of all optimization levels.

-u: [path to save optimized model]: Default is empty so no optimized model would be saved.

-p: [profile_file]: Specifies the profile name to enable profiling and dump the profile data to the file.

-r: [repeated_times]: Specifies the repeated times if running in 'times' test mode.Default:1000.
    
-s: Show statistics result, like P75, P90.

-t: [seconds_to_run]: Specifies the seconds to run for 'duration' mode. Default:600.
    
-v: Show verbose information.
    
-x: [intra_op_num_threads]: Sets the number of threads used to parallelize the execution within nodes. A value of 0 means the test will auto-select a default. Must >=0.

-y: [inter_op_num_threads]: Sets the number of threads used to parallelize the execution of the graph (across nodes), A value of 0 means the test will auto-select a default. Must >=0.

-h: help.

Model path and input data dependency: Performance test uses the same input structure as onnx_test_runner tool. It requrires the directory trees as below:

--ModelName
    --test_data_set_0
        --input0.pb
    --test_data_set_2
        --input0.pb
    --model.onnx

The path of model.onnx needs to be provided as <model_path> argument.

Sample output from the tool will look something like this:

Total time cost:58.8053
Total iterations:1000
Average time cost:58.8053 ms
Total run time:58.8102 s
Min Latency is 0.0559777sec
Max Latency is 0.0623472sec
P50 Latency is 0.0587108sec
P90 Latency is 0.0599845sec
P95 Latency is 0.0605676sec
P99 Latency is 0.0619517sec
P999 Latency is 0.0623472se