This repository provides a step-by-step guide and code for optimizing a state-of-the-art semantic segmentation model using TorchScript, ONNX, and TensorRT.
- CUDA: 12.0 (driver: 525)
- cuDNN: 8.9
- TensorRT: 8.6
- PyCUDA
- Jetpack: 4.6.2
- PyCUDA
- Clone this repository and download the pretrained model from the official PIDNet repository.
For TorchScript:
python tools/export.py --a pidnet-s --p ./pretrained_models/cityscapes/PIDNet_S_Cityscapes_test.pt --f torchscript
For ONNX:
python tools/export.py --a pidnet-s --p ./pretrained_models/cityscapes/PIDNet_S_Cityscapes_test.pt --f onnx
For TensorRT (using the above ONNX model):
trtexec --onnx=path/to/onnx/model --saveEngine=path/to/engine
python tools/inference.py --f pytorch
- Measure the inference speed of PIDNet-S for Cityscapes:
python models/speed/pidnet_speed.py --f all
FPS | % increase | |
---|---|---|
PyTorch | 24.72 | - |
TorchScript | 27.09 | 9.59 |
ONNX (with TensorRT EP) | 33.52 | 35.60 |
TensorRT | 32.93 | 33.21 |
speed test is performed on a single Nvidia GeForce RTX 3050 GPU