-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implemented fast median filter for CUDA using Wavelet Matrix, a constant-time, HDR-compatible method #3627
Implemented fast median filter for CUDA using Wavelet Matrix, a constant-time, HDR-compatible method #3627
Conversation
…x to the median filter for CUDA.
Here is a comparison of pre- and post-PR performance tests. RTX3090, Core i9-9900X, Ubuntu 20.04, CUDA 12.3
(old) x1.98
Testa P100, Xeon E5-2650L v4, Ubuntu 18.04, CUDA 11.6
(old) x3.19
Tesra V100, Xeon E5-2695 v4, Ubuntu 18.04, CUDA 11.6
(old) x3.20
Tesra A100, Xeon Gold 6326, Ubuntu 20.04, CUDA 11.6
(old) x1.33
|
I also checked the build on Windows. RTX A6000, i9-13900K, Windows 11, Cuda 11.6
(old) x2.1
|
Failed CI test, so I fixed the code.
|
I found that older versions of the compiler did not support |
@opencv-alalek I have completed the code fixes, verified that the code passes build and test in various environments, and is ready to be merged. Would you please test it ? |
/cc @cudawarped |
Thanks @opencv-alalek. I really would like to take a look at this but I'm not going to have time for several weeks or longer so its probably best to count me out on this one. Thank you. |
@@ -703,6 +704,18 @@ INSTANTIATE_TEST_CASE_P(CUDA_Filters, Median, testing::Combine( | |||
WHOLE_SUBMAT) | |||
); | |||
|
|||
}} // namespace | |||
#ifdef __OPENCV_USE_WAVELET_MATRIX_FOR_MEDIAN_FILTER_CUDA__ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets run these tests regardless of optimization support.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you very much for your review.
For environments that are not supported by optimization support (e.g. CUDA 10 or lower), this test will always fail, is that still ok?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general API support should not depend on available optimizations. For such cases we usually have fallback generic code.
Do you mean that it fails on CV_Assert(srcType == CV_8UC1);
assertion, right?
It makes sense to emit Error::StsNotImplemented
error return instead of generic assertion.
I believe it is OK in CUDA cases to skip test if there is Error::StsNotImplemented
error reaised.
@@ -0,0 +1,1011 @@ | |||
#ifndef __OPENCV_WAVELET_MATRIX_2D_CUH__ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add OpenCV license header in added files: https://github.com/opencv/opencv/wiki/Coding_Style_Guide#file-structure
// This file is part of OpenCV project.
// It is subject to the license terms in the LICENSE file found in the top-level directory
// of this distribution and at http://opencv.org/license.html.
If this code is borrowed from somewhere then please add original license.
…the Mat is not supported
@opencv-alalek I have made the requested changes (fixed the tests and added the license). Please check again at your convenience. |
@asmorkalov Thank you for reviewing and merging the pull request. The code was very complex, but I appreciate your effort. |
I replaced the existing CUDA implementation of the histogram-based median filter with an implementation of a new wavelet matrix-based median filter algorithm, which I presented at SIGGRAPH Asia 2022.
This paper won the Best Paper Award in the journal track of technical papers (ACM Transactions on Graphics).
This new algorithm, like the histogram method, has the property that the window radius does not affect the computation time, and is several times faster than the histogram method. Furthermore, while the histogram method does not support HDR and only supports 8U images, this new algorithm supports HDR and also supports 16U and 32F images.
I (the author) have published the implementation on my personal GitHub and made some modifications for OpenCV to make it accessible from OpenCV. I used the CUB library, which is part of the standard toolkit since CUDA 11.0. Therefore, depending on the CUDA_VERSION, the code is written to use the new algorithm for versions 11.0 and above, and the existing histogram method for versions 10 and below.
Regarding the old histogram-based code, the CPU version of the median filter supports 16U and 32F for window sizes up to 5, but it seems that the histogram CUDA version of the median filter does not. Also, the number of channels supported is different: the CPU version supports 1, 3, and 4 channels, while the CUDA version supports only 1 channel. In addition, for the CUDA version of the histogram method, pixels at the edges of the image, i.e. where the window is insufficient, were set to zero. For example, if the window size is 7, the width of the 3 pixels at the top, bottom, left, and right were not calculated correctly. When checking the tests, it was found that they compared with the CPU version by cropping the edges with rect, and also the cropping area was too wide, with 8 pixels cropped from the top, bottom, left, and right when the window size was 7.
In this PR, I first corrected the rect range for the tests so that both the old histogram method and the new wavelet matrix method can pass. Also, the CUDA version now supports 16U, 32F, and multi-channel formats such as 3 and 4 channels. In addition, while the CPU version only supports window sizes up to 5 for HDR, the new CUDA Wavelet Matrix method supports sizes of 7 and above. Additionally, I have added new tests for 16U, 32F, and multi-channel formats, specifically 3 and 4 channels.
Paper’s project page: Constant Time Median Filter using 2D Wavelet Matrix | Interactive Graphics & Engineering Lab
My implementation (as author): GitHub - TumoiYorozu/WMatrixMedian
Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [ ] There is a reference to the original bug report and related workPatch to opencv_extra has the same branch name.