Add `alpaka::getPreferredWarpSize(dev)` #2216

fwyzard · 2024-01-09T20:26:52Z

alpaka::getPreferredWarpSize(dev) returns one of the possible warp sizes supported by the device.

On devices that support a single work size (cpu, CUDA gpu, ROCm gpu), getPreferredWarpSize(dev) avoids the overhead of wrapping that value in an std::vector.

On devices that support multiple warp sizes, the value returned by getPreferredWarpSize(dev) is unspecified. Currently it returns the largest supported value -- but this could change in a future version of alpaka.

Add a test for alpaka::getPreferredWarpSize(dev).

alpaka::getPreferredWarpSize(dev) returns one of the possible warp sizes supported by the device. On devices that support a single work size (cpu, CUDA gpu, ROCm gpu), getPreferredWarpSize(dev) avoids the overhead of wrapping that value in an std::vector. On devices that support multiple warp sizes, the value returned by getPreferredWarpSize(dev) is unspecified. Currently it returns the largest supported value -- but this could change in a future version of alpaka. Signed-off-by: Andrea Bocci <[email protected]>

bernhardmgruber · 2024-01-10T13:44:27Z

I am just curious on the purpose of this API. Is the main goal to avoid the heap allocation of auto getWarpSizes(TDev const& dev) -> std::vector<std::size_t>?

Because we could just change the API to either return e.g. a boost::small_vector or just cache the warp sizes in the device and return a std::span. The latter assumes that a device does not change its warp size during program execution.

mehmetyusufoglu · 2024-01-10T13:56:51Z

include/alpaka/dev/DevGenericSycl.hpp

@@ -181,10 +182,22 @@ namespace alpaka::trait
            auto find64 = std::find(warp_sizes.begin(), warp_sizes.end(), 64);
            if(find64 != warp_sizes.end())
                warp_sizes.erase(find64);
+            // Sort the warp sizes in decreasing order
+            std::sort(warp_sizes.begin(), warp_sizes.end(), std::greater<>{});


[Nit] If the vector size can be large somehow, "initially sorting then finding the element in vector" (logarithmic time) is faster compared to "finding deleting then sorting" (linear time). After sorting, using std::lower_bound function finds in log(n) time.

The largest vector I encountered had 5 elements: { 4, 8, 16, 32, 64 }.

@mehmetyusufoglu your analysis is correct, but I am with @fwyzard: the supported warp sizes are probably a small set here :) Also, binary search is slower than linear for small sizes due to data dependent access pattern. So @fwyzard's version is probably faster for our use case :D

bernhardmgruber · 2024-01-10T14:09:08Z

include/alpaka/dev/DevGenericSycl.hpp

            auto find64 = std::find(warp_sizes.begin(), warp_sizes.end(), 64);
            if(find64 != warp_sizes.end())
                warp_sizes.erase(find64);


Btw, in C++20, this should be just std::erase(warp_sizes, 64). Looking forward to the upgrade :)

bernhardmgruber · 2024-01-10T14:10:13Z

@fwyzard if you want the PR merged, please mark the PR as Ready for review, thx!

fwyzard · 2024-01-10T14:13:02Z

Thanks for the review.

I've marked it as a draft because I want to figure out first how it interacts with caching the device information.

fwyzard marked this pull request as draft January 9, 2024 20:29

fwyzard force-pushed the alpaka_getPreferredWarpSize branch from 5e2db10 to 2b368fc Compare January 9, 2024 23:41

bernhardmgruber approved these changes Jan 10, 2024

View reviewed changes

mehmetyusufoglu reviewed Jan 10, 2024

View reviewed changes

bernhardmgruber reviewed Jan 10, 2024

View reviewed changes

fwyzard marked this pull request as ready for review January 16, 2024 09:02

fwyzard merged commit 0fb8037 into alpaka-group:develop Jan 16, 2024
22 checks passed

fwyzard deleted the alpaka_getPreferredWarpSize branch January 16, 2024 09:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `alpaka::getPreferredWarpSize(dev)` #2216

Add `alpaka::getPreferredWarpSize(dev)` #2216

fwyzard commented Jan 9, 2024

bernhardmgruber commented Jan 10, 2024

mehmetyusufoglu Jan 10, 2024

fwyzard Jan 10, 2024

bernhardmgruber Jan 10, 2024

bernhardmgruber Jan 10, 2024

bernhardmgruber commented Jan 10, 2024

fwyzard commented Jan 10, 2024

Add alpaka::getPreferredWarpSize(dev) #2216

Add alpaka::getPreferredWarpSize(dev) #2216

Conversation

fwyzard commented Jan 9, 2024

bernhardmgruber commented Jan 10, 2024

mehmetyusufoglu Jan 10, 2024

Choose a reason for hiding this comment

fwyzard Jan 10, 2024

Choose a reason for hiding this comment

bernhardmgruber Jan 10, 2024

Choose a reason for hiding this comment

bernhardmgruber Jan 10, 2024

Choose a reason for hiding this comment

bernhardmgruber commented Jan 10, 2024

fwyzard commented Jan 10, 2024

Add `alpaka::getPreferredWarpSize(dev)` #2216

Add `alpaka::getPreferredWarpSize(dev)` #2216