Semantic Segmentation with tkDNN

Currently tkDNN supports only ShelfNet as semantic segmentation network.

Run the demo

To run the semantic segmentation demo follow these steps (example with shelfnet):

rm shelfnet_fp32.rt        # be sure to delete(or move) old tensorRT files
export TKDNN_BATCHSIZE=4   # be sure you have batch size > than 1 if you want to run inference on images bigger than 1024
./test_shelfnet            # run the yolo test (is slow)
./demo shelfnet_fp32.rt ../demo/yolo_test.mp4 1 19

In general the demo program takes the following parameters:

./seg_demo <network-rt-file> <path-to-video> <n-batches> <number-of-classes> <resize-flag> <baseline-resize> <show-flag> <write-pred>

where

<network-rt-file> is the rt file generated by a test
<<path-to-video> is the path to a video file or a camera input
<n-batches> number of batches to use in inference (N.B. you should first export TKDNN_BATCHSIZE to the required n_batches and create again the rt file for the network).
<number-of-classes>is the number of classes the network is trained on
<resize-flag> if set to 0 the demo will not resize the input frames, but use it as it is, otherwise it will resize it.
<baseline-resize> is <resize-flag> is set to 1, then the input frames will be proportionally resized using <baseline-resize> as width baseline.
<show-flag> if set to 0 the demo will not show the visualization but save the video into result.mp4 (if n-batches ==1)
<write-pred> if set to 0 (default) the demo will run, otherwise the evaluation of a dataset will run and the output of the segmentation will be saved. Attention: this is under development and paths are embedded, so change them in the code in advance.

NB) By default it is used FP32 inference NB) The batching is not used to work on more streams, rather to work on more tiles of the same image. Shelfnet never resized the input image, therefore for images greater than 1024x1024 tiles of 1024x1024 are given in input to the network in batch.

For other demo videos refer to this playlist.

NB) The gif and the videos are obtained with Mapillary Vistas weights, that we cannot publicly share due to its license restrictions. However, you can train Shelfnet using Mapillary and this fork of the original repo.

FPS Results

Inference FPS of shelfnet with tkDNN, average of 1200 images on:

RTX 2080Ti (CUDA 10.2, TensorRT 7.0.0, Cudnn 7.6.5);
Xavier AGX, Jetpack 4.3 (CUDA 10.0, CUDNN 7.6.3, tensorrt 6.0.1 );

Platform	Test	Phase	FP32, ms	FP32, FPS	FP16, ms	FP16, FPS	INT8, ms	INT8, FPS
RTX 2080Ti	shelfnet 1024x1024 (B=1)	pre	6.11863	163.435	5.81465	171.979	5.88699	169.866
RTX 2080Ti	shelfnet 1024x1024 (B=1)	inf	11.5464	86.6074	7.35396	135.981	6.37623	156.832
RTX 2080Ti	shelfnet 1024x1024 (B=1)	post	4.09058	244.464	3.91961	255.128	4.07343	245.493
RTX 2080Ti	shelfnet 1024x1024 (B=1)	tot	21.7556	45.9652	17.0882	58.5199	16.3366	61.2121
RTX 2080Ti	shelfnet 2048x2048 (B=4)	pre	25.435	39.3158	25.2953	39.5331	25.9303	38.565
RTX 2080Ti	shelfnet 2048x2048 (B=4)	inf	36.5015	27.3961	17.0534	58.6395	15.6061	64.0773
RTX 2080Ti	shelfnet 2048x2048 (B=4)	post	17.3917	57.4985	17.1649	58.2583	17.5539	56.9675
RTX 2080Ti	shelfnet 2048x2048 (B=4)	tot	79.3283	12.6058	59.5136	16.8029	59.0903	16.9233
AGX Xavier	shelfnet 1024x1024 (B=1)	pre	8.0174	124.729	7.5117	133.126	7.47333	133.809
AGX Xavier	shelfnet 1024x1024 (B=1)	inf	72.4173	13.8089	37.505	26.6631	31.3286	31.9197
AGX Xavier	shelfnet 1024x1024 (B=1)	post	8.89958	112.365	8.83576	113.176	9.42655	106.083
AGX Xavier	shelfnet 1024x1024 (B=1)	tot	89.3342	11.1939	53.8525	18.5692	48.2285	20.7346
AGX Xavier	shelfnet 2048x2048 (B=4)	pre	47.1454	21.211	21.6475	46.1947	21.4201	46.6851
AGX Xavier	shelfnet 2048x2048 (B=4)	inf	266.537	3.75183	128.321	7.79293	107.621	9.29185
AGX Xavier	shelfnet 2048x2048 (B=4)	post	44.0711	22.6906	40.1732	24.8922	39.873	25.0796
AGX Xavier	shelfnet 2048x2048 (B=4)	tot	357.753	2.79522	190.142	5.25922	168.914	5.92016

Known issues

When creating the rt file all the checks returns errors. It is due to a different resize function and handling of the original ShelfNet outputs. However, the network is supposed to work.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!