Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA: Allow for more thread blocks than the X dimension of the block grid #41

Merged
merged 2 commits into from
Apr 13, 2020

Conversation

pavanbalaji
Copy link
Contributor

@pavanbalaji pavanbalaji commented Apr 13, 2020

Pull Request Description

This PR allows us to have as many thread blocks as allowed in all of the three dimensions of the block grid combined.

Expected Impact

This would allow us to pack/unpack larger data sizes than before.

Author Checklist

@pavanbalaji pavanbalaji self-assigned this Apr 13, 2020
@pavanbalaji pavanbalaji requested a review from yfguo April 13, 2020 03:50
@pavanbalaji
Copy link
Contributor Author

This PR fixes #17

@pavanbalaji pavanbalaji added this to the yaksa-1.0b1 milestone Apr 13, 2020
@pavanbalaji pavanbalaji linked an issue Apr 13, 2020 that may be closed by this pull request
Even though we use a single dimension right now, we should send all
three dimensions to the kernel.  This allows us to eventually tune the
number of dimensions used.

Signed-off-by: Pavan Balaji <[email protected]>
This allows us to handle much larger pack/unpack sizes, and should be
sufficient for the forseeable future.

Fixes pmodels#17

Signed-off-by: Pavan Balaji <[email protected]>
@pavanbalaji pavanbalaji merged commit 8465514 into pmodels:master Apr 13, 2020
@pavanbalaji pavanbalaji deleted the pr/thread-blocks branch April 13, 2020 21:30
Copy link
Contributor

@gcongiu gcongiu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR looks nice I have only one comment

*n_threads = THREAD_BLOCK_SIZE;
uint64_t n_blocks = count * cuda_type->num_elements / THREAD_BLOCK_SIZE;
n_blocks += ! !(count * cuda_type->num_elements % THREAD_BLOCK_SIZE);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For correctness, should this return an error code if the number of blocks exceeds the max allowed size? Or simply assert?

Copy link
Contributor

@gcongiu gcongiu Apr 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I think I have commented too late :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be more than the size of int64_t. At that point, we'd need to change a whole lot of code in yaksa to make it work, and an assert would not be sufficient.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CUDA: respect maximum number of thread blocks
3 participants