Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add alpaka::getPreferredWarpSize(dev) #2216

Merged
merged 1 commit into from
Jan 16, 2024

Conversation

fwyzard
Copy link
Contributor

@fwyzard fwyzard commented Jan 9, 2024

alpaka::getPreferredWarpSize(dev) returns one of the possible warp sizes supported by the device.

On devices that support a single work size (cpu, CUDA gpu, ROCm gpu), getPreferredWarpSize(dev) avoids the overhead of wrapping that value in an std::vector.

On devices that support multiple warp sizes, the value returned by getPreferredWarpSize(dev) is unspecified. Currently it returns the largest supported value -- but this could change in a future version of alpaka.

Add a test for alpaka::getPreferredWarpSize(dev).

@fwyzard fwyzard marked this pull request as draft January 9, 2024 20:29
alpaka::getPreferredWarpSize(dev) returns one of the possible warp sizes
supported by the device.
On devices that support a single work size (cpu, CUDA gpu, ROCm gpu),
getPreferredWarpSize(dev) avoids the overhead of wrapping that value in an
std::vector.
On devices that support multiple warp sizes, the value returned by
getPreferredWarpSize(dev) is unspecified. Currently it returns the largest
supported value -- but this could change in a future version of alpaka.

Signed-off-by: Andrea Bocci <[email protected]>
@fwyzard fwyzard force-pushed the alpaka_getPreferredWarpSize branch from 5e2db10 to 2b368fc Compare January 9, 2024 23:41
@bernhardmgruber
Copy link
Member

I am just curious on the purpose of this API. Is the main goal to avoid the heap allocation of auto getWarpSizes(TDev const& dev) -> std::vector<std::size_t>?

Because we could just change the API to either return e.g. a boost::small_vector or just cache the warp sizes in the device and return a std::span. The latter assumes that a device does not change its warp size during program execution.

@@ -181,10 +182,22 @@ namespace alpaka::trait
auto find64 = std::find(warp_sizes.begin(), warp_sizes.end(), 64);
if(find64 != warp_sizes.end())
warp_sizes.erase(find64);
// Sort the warp sizes in decreasing order
std::sort(warp_sizes.begin(), warp_sizes.end(), std::greater<>{});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Nit] If the vector size can be large somehow, "initially sorting then finding the element in vector" (logarithmic time) is faster compared to "finding deleting then sorting" (linear time). After sorting, using std::lower_bound function finds in log(n) time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The largest vector I encountered had 5 elements: { 4, 8, 16, 32, 64 }.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mehmetyusufoglu your analysis is correct, but I am with @fwyzard: the supported warp sizes are probably a small set here :) Also, binary search is slower than linear for small sizes due to data dependent access pattern. So @fwyzard's version is probably faster for our use case :D

Comment on lines 182 to 184
auto find64 = std::find(warp_sizes.begin(), warp_sizes.end(), 64);
if(find64 != warp_sizes.end())
warp_sizes.erase(find64);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw, in C++20, this should be just std::erase(warp_sizes, 64). Looking forward to the upgrade :)

@bernhardmgruber
Copy link
Member

@fwyzard if you want the PR merged, please mark the PR as Ready for review, thx!

@fwyzard
Copy link
Contributor Author

fwyzard commented Jan 10, 2024

Thanks for the review.

I've marked it as a draft because I want to figure out first how it interacts with caching the device information.

@fwyzard fwyzard marked this pull request as ready for review January 16, 2024 09:02
@fwyzard fwyzard merged commit 0fb8037 into alpaka-group:develop Jan 16, 2024
22 checks passed
@fwyzard fwyzard deleted the alpaka_getPreferredWarpSize branch January 16, 2024 09:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Status: Done
Development

Successfully merging this pull request may close these issues.

3 participants