TensorRT_EX

Enviroments

Windows 10 laptop
CPU i7-11375H
GPU RTX-3060
Visual studio 2017
CUDA 11.1
TensorRT 8.0.3.4 (unet)
TensorRT 8.2.0.6 (detr, yolov5s, real-esrgan)
Opencv 3.4.5
make Engine directory for engine file
make Int8_calib_table directory for ptq calibration table

Custom plugin example

Layer for input preprocess(NHWC->NCHW, BGR->RGB, [0, 255]->[0, 1] (Normalize))
plugin_ex1.cpp (plugin sample code)
preprocess.hpp (plugin define)
preprocess.cu (preprocessing cuda kernel function)
Validation_py/Validation_preproc.py (Result validation with pytorch)

Classification model

vgg11 model

vgg11.cpp
with preprocess plugin

resnet18 model

resnet18.cpp
100 images from COCO val2017 dataset for PTQ calibration
Match all results with PyTorch
Comparison of calculation execution time of 100 iteration and GPU memory usage for one 224x224x3 image

	Pytorch	TensorRT	TensorRT	TensorRT
Precision	FP32	FP32	FP16	Int8(PTQ)
Avg Duration time [ms]	4.1 ms	1.7 ms	0.7 ms	0.6 ms
FPS [frame/sec]	243 fps	590 fps	1385 fps	1577 fps
Memory [GB]	1.551 GB	1.288 GB	0.941 GB	0.917 GB

Semantic Segmentaion model

UNet model (unet.cpp)
use TensorRT 8.0.3.4 version for unet model(For version 8.2.0.6, an error about the unet model occurs)
unet_carvana_scale0.5_epoch1.pth
additional preprocess (resize & letterbox padding) with openCV
postprocess (model output to image)
Match all results with PyTorch
Comparison of calculation execution time of 100 iteration and GPU memory usage for one 512x512x3 image

	Pytorch	Pytorch	TensorRT	TensorRT	TensorRT
Precision	FP32	FP16	FP32	FP16	Int8(PTQ)
Avg Duration time [ms]	66.21 ms	34.58 ms	40.81 ms	13.52 ms	8.19 ms
FPS [frame/sec]	15 fps	29 fps	25 fps	77 fps	125 fps
Memory [GB]	3.863 GB	2.677 GB	1.552 GB	1.367 GB	1.051 GB

Object Detection model(ViT)

DETR model (detr_trt.cpp)
additional preprocess (mean std normalization function)
postprocess (show out detection result to the image)
Match all results with PyTorch
Comparison of calculation execution time of 100 iteration and GPU memory usage for one 500x500x3 image

	Pytorch	Pytorch	TensorRT	TensorRT	TensorRT
Precision	FP32	FP16	FP32	FP16	Int8(PTQ)
Avg Duration time [ms]	37.03 ms	30.71 ms	16.40 ms	6.07 ms	5.30 ms
FPS [frame/sec]	27 fps	33 fps	61 fps	165 fps	189 fps
Memory [GB]	1.563 GB	1.511 GB	1.212 GB	1.091 GB	1.005 GB

Object Detection model

Yolov5s model (yolov5s.cpp)
Comparison of calculation execution time of 100 iteration and GPU memory usage for one 640x640x3 image resized & padded

	Pytorch	TensorRT	TensorRT
Precision	FP32	FP32	Int8(PTQ)
Avg Duration time [ms]	7.72 ms	6.16 ms	2.86 ms
FPS [frame/sec]	129 fps	162 fps	350 fps
Memory [GB]	1.670 GB	1.359 GB	0.920 GB

Super-Resolution model

Real-ESRGAN model (real-esrgan.cpp)
RealESRGAN_x4plus.pth
Scale up 4x (448x640x3 -> 1792x2560x3)
Comparison of calculation execution time of 100 iteration and GPU memory usage
[update] RealESRGAN_x2plus model (set OUT_SCALE=2)

	Pytorch	Pytorch	TensorRT	TensorRT
Precision	FP32	FP16	FP32	FP16
Avg Duration time [ms]	4109 ms	1936 ms	2139 ms	737 ms
FPS [frame/sec]	0.24 fps	0.52 fps	0.47 fps	1.35 fps
Memory [GB]	5.029 GB	4.407 GB	3.807 GB	3.311 GB

Object Detection model 2

Yolov6s model (yolov6.cpp)
Comparison of calculation execution time of 1000 iteration and GPU memory usage (with preprocess, without nms, 536 x 640 x 3)

	Pytorch	TensorRT	TensorRT	TensorRT
Precision	FP32	FP32	FP16	Int8(PTQ)
Avg Duration time [ms]	20.7 ms	10.3 ms	3.54 ms	2.58 ms
FPS [frame/sec]	48.14 fps	96.21 fps	282.26 fps	387.89 fps
Memory [GB]	1.582 GB	1.323 GB	0.956 GB	0.913 GB

Object Detection model 3 (in progress)

Yolov7 model (yolov7.cpp)

Using C TensoRT model in Python using dll

TRT_DLL_EX : https://github.com/yester31/TRT_DLL_EX

A typical TensorRT model creation sequence using TensorRT API

Prepare the trained model in the training framework (generate the weight file to be used in TensorRT).
Implement the model using the TensorRT API to match the trained model structure.
Extract weights from the trained model.
Make sure to pass the weights appropriately to each layer of the prepared TensorRT model.
Build and run.
After the TensorRT model is built, the model stream is serialized and generated as an engine file.
Inference by loading only the engine file in the subsequent task(if model parameters or layers are modified, re-execute the previous (4) task).

reference

tensorrtx : https://github.com/wang-xinyu/tensorrtx
unet : https://github.com/milesial/Pytorch-UNet
detr : https://github.com/facebookresearch/detr
yolov5 : https://github.com/ultralytics/yolov5
real-esrgan : https://github.com/xinntao/Real-ESRGAN
yolov6 : https://github.com/meituan/YOLOv6

Name		Name	Last commit message	Last commit date
Latest commit History 60 Commits
DETR_py		DETR_py
Data_calib		Data_calib
Real-ESRGAN_py		Real-ESRGAN_py
Resnet18_py		Resnet18_py
TPS_Motion_py		TPS_Motion_py
TensorRT		TensorRT
TestData3		TestData3
Unet_py		Unet_py
VGG11_py		VGG11_py
Validation_py		Validation_py
yolov5s_py		yolov5s_py
yolov6s_py		yolov6s_py
yolov7_py		yolov7_py
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
TensorRT.sln		TensorRT.sln

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TensorRT_EX

Enviroments

Custom plugin example

Classification model

vgg11 model

resnet18 model

Semantic Segmentaion model

Object Detection model(ViT)

Object Detection model

Super-Resolution model

Object Detection model 2

Object Detection model 3 (in progress)

Using C TensoRT model in Python using dll

A typical TensorRT model creation sequence using TensorRT API

reference

About

Releases

Packages

Languages

License

yester31/TensorRT_API

Folders and files

Latest commit

History

Repository files navigation

TensorRT_EX

Enviroments

Custom plugin example

Classification model

vgg11 model

resnet18 model

Semantic Segmentaion model

Object Detection model(ViT)

Object Detection model

Super-Resolution model

Object Detection model 2

Object Detection model 3 (in progress)

Using C TensoRT model in Python using dll

A typical TensorRT model creation sequence using TensorRT API

reference

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages