-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CUDA: Allow for more thread blocks than the X dimension of the block grid #41
Conversation
This PR fixes #17 |
71ac217
to
b473d98
Compare
Even though we use a single dimension right now, we should send all three dimensions to the kernel. This allows us to eventually tune the number of dimensions used. Signed-off-by: Pavan Balaji <[email protected]>
This allows us to handle much larger pack/unpack sizes, and should be sufficient for the forseeable future. Fixes pmodels#17 Signed-off-by: Pavan Balaji <[email protected]>
b473d98
to
591dc0a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR looks nice I have only one comment
*n_threads = THREAD_BLOCK_SIZE; | ||
uint64_t n_blocks = count * cuda_type->num_elements / THREAD_BLOCK_SIZE; | ||
n_blocks += ! !(count * cuda_type->num_elements % THREAD_BLOCK_SIZE); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For correctness, should this return an error code if the number of blocks exceeds the max allowed size? Or simply assert?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I think I have commented too late :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would be more than the size of int64_t
. At that point, we'd need to change a whole lot of code in yaksa to make it work, and an assert would not be sufficient.
Pull Request Description
This PR allows us to have as many thread blocks as allowed in all of the three dimensions of the block grid combined.
Expected Impact
This would allow us to pack/unpack larger data sizes than before.
Author Checklist
module: short description
and follows good practice