Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

2D clfft segfault #10

Open
BeauJoh opened this issue Sep 7, 2017 · 0 comments
Open

2D clfft segfault #10

BeauJoh opened this issue Sep 7, 2017 · 0 comments

Comments

@BeauJoh
Copy link

BeauJoh commented Sep 7, 2017

Hi folks,

I've extended the clfft code in my branch to accept arbitrary powers of 2 arguments for both the 1 and 2D use cases. In doing this I noticed there is a potential bug in a code path not currently hit. However if in the future you need to select problem sizes greater than 128 elements in the 2nd dimension, like so, ./clfft -p 0 -d 0 -t 0 -- --2D --pts1 8192 --pts2 256 then you'll get a segmentation fault. After a full day of hacking on this code, I noticed that the reason for this is an incorrect input and output offset in the fft2 kernel. This offset is actually generated in determining the local and global workgroup sizes at line 470 of fftlib.cpp. The bug is in the function getGlobalDimension(localsz1, globalsz1, fftn1, fftn2, 0); which is undocumented and quite gnarly. I believe this is a typo since this is the only location the 3rd argument (here fftn1) is set to be anything else other than 1. I believe this argument is for the preferred local workgroup size, and the last argument indicates whether a vertical or horizontal kernel invocation is needed. Thus it returns arg1 being the localWorkgroup size, and arg2 is the numberOfWorkgroups required. This is converted from CUDA style to the OpenCL style in the next line with globalsz2 = globalsz2 * localsz2;.

I've corrected it in my branch, but if you don't approve my pull request I'd suggest changing it to getGlobalDimension(localsz1, globalsz1, 1, fftn1, 0); on line 465 and getGlobalDimension(localsz2, globalsz2, 1, fftn2, 1); on 470.

Additionally, I don't believe this was based off the SHOC codes as indicated in the README, since SHOC only performs 1D computation. It is however very close in implementation to the paper presented here. I just wish I could find the code base on which this was adapted.

Beau.

p.s. the relevant changes can be found in this commit

@BeauJoh BeauJoh changed the title 2D clfft bug 2D clfft segfault Sep 7, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant