Real-time inference using TensorRT.
- convert onnx to trt on target hardware
- run yolo models on target hardware (and it automatically creates the trt engine file)
- run engine file on target hardware
- Download a yolo model
- Update the Makefile with your ARCH_BIN (see #reference for details)
- Start the docker container.
make build
make run
- Build the code
- place your images in
src/images
# inside the docker container
mkdir build && cd build
cmake ..
make -j -l4
- alternatively, if you want to build an individual module alone, you can follow these steps
# go to module of interest
$ cd /src/engine
# create build directory
$ mkdir build && cd build
# build the project
$ cmake .. && make -j
modules
the main CMakeLists.txt builds these folders:
converter ----> converts yolo model to tensorRT serialized engine file (trt engine file)
engine ----> runs a trt engine file
yolo ----> runs a yolo model (converts to trt engine and runs)
When you build in the main directory here is what the outputs look like
/src/build# tree -L 2 -I 'CMakeFiles'
.
|-- CMakeCache.txt
|-- Makefile
|-- cmake_install.cmake
|-- converter
| |-- Makefile
| |-- cmake_install.cmake
| `-- onnx2trt <<<<<<<<<< convert onnx 2 trt
|-- engine
| |-- Makefile
| |-- cmake_install.cmake
| `-- engine <<<<<<<<<< run serialized engine file
`-- yolo
|-- Makefile
|-- cmake_install.cmake
|-- detectImage <<<<<<<<<< run object detection on an image with yolo model
|-- detectWebcam <<<<<<<<<< run object detection on an webcam with yolo model
|-- libyolo.so
`-- profile <<<<<<<<<< calculate yolo model execution time when doing detection on image
arch_bin
The Dockerfile has an ARG
ARCH_BIN
that is used to build openCV wth cuda support.
You can check nvidia docs to match your gpu and set ARCH_BIN in the Makefile
# here we have GeForce GTX 1050. The docs label it as ARCH_BIN=6.1
$ nvidia-smi
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.54.03 Driver Version: 535.54.03 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce GTX 1050 Off | 00000000:01:00.0 Off | N/A |
...
...
+---------------------------------------------------------------------------------------+
version check
- check your versions (inside docker container)
# TensorRT version
$ find / -name NvInferVersion.h -type f
/usr/include/x86_64-linux-gnu/NvInferVersion.h
# this displays TensorRT version 8.6.1
$ cat /usr/include/x86_64-linux-gnu/NvInferVersion.h | grep NV_TENSORRT | head -n 3
#define NV_TENSORRT_MAJOR 8 //!< TensorRT major version.
#define NV_TENSORRT_MINOR 6 //!< TensorRT minor version.
#define NV_TENSORRT_PATCH 1 //!< TensorRT patch version.
# this displays cudNN version 8.9.1
$ cat /usr/include/x86_64-linux-gnu/cudnn_v*.h | grep CUDNN_MAJOR -A 2 | head -n 3
#define CUDNN_MAJOR 8
#define CUDNN_MINOR 9
#define CUDNN_PATCHLEVEL 1
tao converter
# make run puts you inside the docker container
# before running this, check the README.txt in /src/scripts/tao-converter and install any dependencies and set paths
/tmp/tao-converter# export MODEL_PATH=~/path/to/folder
/tmp/tao-converter# export MODEL=replace_with_model_name
/tmp/tao-converter# export KEY=replace_with_nvidia_key
/tmp/tao-converter# ./tao-converter -k "${KEY}" -t fp16 -e "${MODEL_PATH}/${MODEL}.engine" -o output "${MODEL_PATH}/${MODEL}.etlt"
[INFO] ----------------------------------------------------------------
[INFO] Input filename: /tmp/filer9wcjU
[INFO] ONNX IR version: 0.0.7
[INFO] Opset version: 13
[INFO] Producer name: pytorch
[INFO] Producer version: 1.10
trtexec
- ask for help
$ /usr/src/tensorrt/bin/trtexec --help
- profile model speed
# load in a onnx file
$ export MODEL_PATH=/path/to/folder
$ export ONNX_NAME=model.onnx
$ export TRT_NAME=model.engine
$ /usr/src/tensorrt/bin/trtexec --onnx="${MODEL_PATH}/${ONNX_NAME}" --iterations=5 --workspace=4096
# load in a trt engine file
$ /usr/src/tensorrt/bin/trtexec --loadEngine="${MODEL_PATH}/${TRT_NAME}" --fp16 --batch=1 --iterations=50 --workspace=4096
# save logs to a file
$ /usr/src/tensorrt/bin/trtexec --loadEngine="${MODEL_PATH}/${TRT_NAME}" --fp16 --batch=1 --iterations=50 --workspace=4096 > stats.log
- model conversion
$ export MODEL_PATH=/path/to/folder
$ export MODEL_NAME=model
# convert the model to FP16 (if supported on hardware)
$ /usr/src/tensorrt/bin/trtexec --onnx="${MODEL_PATH}/${MODEL_NAME}.onnx" --saveEngine="${MODEL_PATH}/${MODEL_NAME}_fp16.engine" --useCudaGraph --fp16 > "${MODEL_NAME}_fp16.log"