yolov4-postproc-cpp-sample

The postprocess of the YOLOv4 ( C++ Version )

Introduction

How does YOLO-v4 work ?

YOLO-v4 consists of 5 main steps, namely, data input, preprocessing, inference, post-processing, and drawing. Here we mainly focus on the last two parts, post-processing and drawing.

Purpose of writing this code

Since we know that the runtime of Python is somehow relatively slower than other types of programming languages, especially, C and C++, just to name a few. Therefore, I tried to rewrite the yolo-v4 post-process Python version with C++ so as to test whether the total runtime of post-processing and result drawing can be sped up or not.

Result in each threshold

Here we use the three layer outputs obtained after conducting YOLO-v4 inference incl. conv2d_58_Conv2D_YoloRegion, conv2d_66_Conv2D_YoloRegion, and conv2d_74_Conv2D_YoloRegion to implement YOLO-v4 post-processing with C++.

input image

threshold 0.99

threshold 0.6

threshold 0.1

Performace metrics Python vs. C++ (time: millisecond)

threshold 0.99

	Reshape	Filter	NMS	Total post-process	Drawing	Total runtime
Python	1.1339	2.8736	0.0996	$${\color{green}4.1906}$$	1.5921	$${\color{green}5.7828}$$
C++	$${\color{red}4.423}$$	2.529	0.002	7.195	$${\color{orange}0.932}$$	8.128

threshold 0.6

	Reshape	Filter	NMS	Total post-process	Drawing	Total runtime
Python	0.4694	3.3707	1.9178	$${\color{green}5.8209}$$	9.7203	15.5413
C++	$${\color{red}4.453}$$	2.583	0.026	7.321	$${\color{orange}5.409}$$	$${\color{green}12.731}$$

threshold 0.1

	Reshape	Filter	NMS	Total post-process	Drawing	Total runtime
Python	1.0774	7.1897	13.7035	22.1300	28.6309	50.7609
C++	$${\color{red}4.579}$$	2.826	0.161	$${\color{green}7.892}$$	$${\color{orange}17.956}$$	$${\color{green}25.848}$$

Issue

During the testing of the code, we have discovered that the runtime of the "transpose" function in "reshape" section took up around 1/2 of the total process runtime. Since Python Numpy module uses BLAS and LAPACK to execute matrix, vector, and linear algebra-related operations, we come up with the idea of solving this issue with Xtensor-Blas module.

Improvement

Improvement 1
Here the transpose function is replaced with the code below:

xt::xarray<float> transpose(xt::xarray<float>& predictions) {
    
    xt::xarray<float>::shape_type shape = {predictions.shape()[0], predictions.shape()[2], predictions.shape()[3], predictions.shape()[1]};
    xt::xarray<float> new_predictions(shape);

    for (std::size_t n = 0; n < predictions.shape()[0]; n++) {
        for (std::size_t h = 0; h < predictions.shape()[2]; h++) {
            for (std::size_t w = 0; w < predictions.shape()[3]; w++) {
                for (std::size_t c = 0; c < predictions.shape()[1]; c++) {
                    new_predictions(n, h, w, c) = predictions(n, c, h, w);
                }
            }
        }
    }
    return new_predictions;
}

Performace metrics in C++ after Improvement 1 (best record so far)(time: millisecond)

0.99

0.6

0.1

Reshape

Total runtime

Reshape

Total runtime

Reshape

Total runtime

Before

4.423

8.128

Before

4.453

12.731

Before

4.579

25.848

After

$${\color{green}3.894}$$

$${\color{green}7.633}$$

After

$${\color{green}3.861}$$

$${\color{green}12.078}$$

After

$${\color{green}3.860}$$

$${\color{green}24.575}$$

How to run it?

Step 1.

Install Xtensor
First, install xtl

cd /opt
git clone https://github.com/xtensor-stack/xtl.git
cd xtl
cmake -D CMAKE_INSTALL_PREFIX=/opt/xtl

Install Xtensor

git clone https://github.com/xtensor-stack/xtensor.git
cd xtensor
cmake -DCMAKE_INSTALL_PREFIX=/opt/xtensor
make install

Step 2.

To run code
First, locate to build directory, and

cmake ../project -DCMAKE_INSTALL_PREFIX=/opt/ ..
make

Step 3.

execute post-process.cpp

./pp obj_input.jpg

Step 4.

When the code successfully run, the result will be (shown in microsecond. Please multiply by 0.001 to convert it to milliseconds.):

Reference

https://superfastpython.com/what-is-blas-and-lapack-in-numpy/
https://max-c.notion.site/C-Numpy-Python-NPY-efe8a325aacb43ec9827f86185220fdc

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
build		build
data		data
drawing-results		drawing-results
project		project
Dockerfile		Dockerfile
README.md		README.md
run-final-project.sh		run-final-project.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

yolov4-postproc-cpp-sample

Introduction

How does YOLO-v4 work ?

Purpose of writing this code

Result in each threshold

Performace metrics Python vs. C++ (time: millisecond)

threshold 0.99

threshold 0.6

threshold 0.1

Issue

Improvement

Performace metrics in C++ after Improvement 1 (best record so far)(time: millisecond)

How to run it?

Step 1.

Step 2.

Step 3.

Step 4.

Reference

About

Uh oh!

Releases

Packages

Languages

userfromgithub/yolov4-postproc-cpp-sample

Folders and files

Latest commit

History

Repository files navigation

yolov4-postproc-cpp-sample

Introduction

How does YOLO-v4 work ?

Purpose of writing this code

Result in each threshold

Performace metrics Python vs. C++ (time: millisecond)

threshold 0.99

threshold 0.6

threshold 0.1

Issue

Improvement

Performace metrics in C++ after Improvement 1 (best record so far)(time: millisecond)

How to run it?

Step 1.

Step 2.

Step 3.

Step 4.

Reference

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages