You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've extended the clfft code in my branch to accept arbitrary powers of 2 arguments for both the 1 and 2D use cases. In doing this I noticed there is a potential bug in a code path not currently hit. However if in the future you need to select problem sizes greater than 128 elements in the 2nd dimension, like so, ./clfft -p 0 -d 0 -t 0 -- --2D --pts1 8192 --pts2 256 then you'll get a segmentation fault. After a full day of hacking on this code, I noticed that the reason for this is an incorrect input and output offset in the fft2 kernel. This offset is actually generated in determining the local and global workgroup sizes at line 470 of fftlib.cpp. The bug is in the function getGlobalDimension(localsz1, globalsz1, fftn1, fftn2, 0); which is undocumented and quite gnarly. I believe this is a typo since this is the only location the 3rd argument (here fftn1) is set to be anything else other than 1. I believe this argument is for the preferred local workgroup size, and the last argument indicates whether a vertical or horizontal kernel invocation is needed. Thus it returns arg1 being the localWorkgroup size, and arg2 is the numberOfWorkgroups required. This is converted from CUDA style to the OpenCL style in the next line with globalsz2 = globalsz2 * localsz2;.
I've corrected it in my branch, but if you don't approve my pull request I'd suggest changing it to getGlobalDimension(localsz1, globalsz1, 1, fftn1, 0); on line 465 and getGlobalDimension(localsz2, globalsz2, 1, fftn2, 1); on 470.
Additionally, I don't believe this was based off the SHOC codes as indicated in the README, since SHOC only performs 1D computation. It is however very close in implementation to the paper presented here. I just wish I could find the code base on which this was adapted.
Beau.
p.s. the relevant changes can be found in this commit
The text was updated successfully, but these errors were encountered:
Hi folks,
I've extended the
clfft
code in my branch to accept arbitrary powers of 2 arguments for both the 1 and 2D use cases. In doing this I noticed there is a potential bug in a code path not currently hit. However if in the future you need to select problem sizes greater than 128 elements in the 2nd dimension, like so,./clfft -p 0 -d 0 -t 0 -- --2D --pts1 8192 --pts2 256
then you'll get a segmentation fault. After a full day of hacking on this code, I noticed that the reason for this is an incorrect input and output offset in thefft2
kernel. This offset is actually generated in determining the local and global workgroup sizes at line 470 of fftlib.cpp. The bug is in the functiongetGlobalDimension(localsz1, globalsz1, fftn1, fftn2, 0);
which is undocumented and quite gnarly. I believe this is a typo since this is the only location the 3rd argument (herefftn1
) is set to be anything else other than 1. I believe this argument is for the preferred local workgroup size, and the last argument indicates whether a vertical or horizontal kernel invocation is needed. Thus it returns arg1 being the localWorkgroup size, and arg2 is the numberOfWorkgroups required. This is converted from CUDA style to the OpenCL style in the next line withglobalsz2 = globalsz2 * localsz2;
.I've corrected it in my branch, but if you don't approve my pull request I'd suggest changing it to
getGlobalDimension(localsz1, globalsz1, 1, fftn1, 0);
on line 465 andgetGlobalDimension(localsz2, globalsz2, 1, fftn2, 1);
on 470.Additionally, I don't believe this was based off the SHOC codes as indicated in the README, since SHOC only performs 1D computation. It is however very close in implementation to the paper presented here. I just wish I could find the code base on which this was adapted.
Beau.
p.s. the relevant changes can be found in this commit
The text was updated successfully, but these errors were encountered: