Skip to content

Releases: ROCm/hipCUB

hipCUB 3.3.0 for ROCm 6.3.1

20 Dec 16:12
a4b3b19
Compare
Choose a tag to compare

hipCUB code for ROCm 6.3.1 did not change. The library was rebuilt for the updated ROCm 6.3.1 stack.

hipCUB 3.3.0 for ROCm 6.3.0

03 Dec 19:49
a4b3b19
Compare
Choose a tag to compare

Added

  • Support for large indices in hipcub::DeviceSegmentedReduce::* has been added, with the exception of DeviceSegmentedReduce::Arg*. Although rocPRIM's backend provides support for all reduce variants, CUB does not support large indices in DeviceSegmentedReduce::Arg*. For this reason, large index support is not available for hipcub::DeviceSegmentedReduce::Arg*.

Changed

  • Changed the default value of rmake.py -a to default_gpus. This is equivalent to gfx906:xnack-,gfx1030,gfx1100,gfx1101,gfx1102,gfx1151,gfx1200,gfx1201.
  • The NVIDIA backend now requires CUB, Thrust, and libcu++ 2.3.2.

Resolved issues

  • Fixed an issue in rmake.py where the list storing cmake options would contain individual characters instead of a full string of options.
  • Fixed an issue where config.hpp was not included in all hipCUB headers, resulting in build errors.

hipCUB 3.2.1 for ROCm 6.2.4

06 Nov 19:55
a2cf6f9
Compare
Choose a tag to compare

Added

  • GFX1151 Support

hipCUB 3.2.0 for ROCm 6.2.2

27 Sep 16:01
1875530
Compare
Choose a tag to compare

hipCUB code for ROCm 6.2.2 did not change. The library was rebuilt for the updated ROCm 6.2.2 stack.

hipCUB 3.2.0 for ROCm 6.2.1

20 Sep 19:57
1875530
Compare
Choose a tag to compare

hipCUB code for ROCm 6.2.1 did not change. The library was rebuilt for the updated ROCm 6.2.1 stack.

hipCUB 3.2.0 for ROCm 6.2.0

02 Aug 16:15
1875530
Compare
Choose a tag to compare

Added

  • Add DeviceCopy function to have parity with CUB.

  • In the rocPRIM backend, added enum WarpExchangeAlgorithm, which is used as the new optional template argument for WarpExchange.

    • The potential values for the enum are WARP_EXCHANGE_SMEM and WARP_EXCHANGE_SHUFFLE.
    • WARP_EXCHANGE_SMEM stands for the previous algorithm, while WARP_EXCHANGE_SHUFFLE performs the exchange via shuffle operations.
    • WARP_EXCHANGE_SHUFFLE does not require any pre-allocated shared memory, but the ItemsPerThread must be a divisor of WarpSize.
  • Added tuple.hpp which defines templates hipcub::tuple, hipcub::tuple_element, hipcub::tuple_element_t and hipcub::tuple_size.

  • Added new overloaded member functions to BlockRadixSort and DeviceRadixSort that expose a decomposer argument. Keys of a custom
    type (key_type) can be sorted via these overloads, if an appropriate decomposer is passed. The decomposer has to implement
    operator(const key_type&) which returns a hipcub::tuple of references pointing to members of key_type.

  • On AMD GPUs (using the HIP backend), it is possible to issue hipCUB API calls inside of
    hipGraphs, with several exceptions:

    • CachingDeviceAllocator
    • GridBarrierLifetime
    • DeviceSegmentedRadixSort
    • DeviceRunLengthEncode
      Currently, these classes rely on one or more synchronous calls to function correctly. Because of this, they cannot be used inside of hipGraphs.

Changed

  • The NVIDIA backend now requires CUB, Thrust and libcu++ 2.2.0. If it is not found it will be downloaded from the NVIDIA CCCL repository.

Fixed

  • Fixed the derivation for the accumulator type for device scan algorithms in the rocPRIM backend being different compared to CUB.
    It now derives the accumulator type as the result of the binary operator.
  • debug_synchronous has been deprecated in hipCUB-2.13.2, and it no longer has any effect. With this release, passing debug_synchronous
    to the device functions results in a deprecation warning both at runtime and at compile time.
    • The synchronization that was previously achievable by passing debug_synchronous=true can now be achieved at compile time
      by setting the CUB_DEBUG_SYNC (or higher debug level) or the HIPCUB_DEBUG_SYNC preprocessor definition.
    • The compile time deprecation warnings can be disabled by defining the HIPCUB_IGNORE_DEPRECATED_API preprocessor definition.

hipCUB 3.1.0 for ROCm 6.1.2

04 Jun 16:53
44aa2e0
Compare
Choose a tag to compare

hipCUB code for ROCm 6.1.2 did not change. The library was rebuilt for the updated ROCm 6.1.2 stack.

hipCUB 3.1.0 for ROCm 6.1.1

08 May 17:59
44aa2e0
Compare
Choose a tag to compare

hipCUB code for ROCm 6.1.1 did not change. The library was rebuilt for the updated ROCm 6.1.1 stack.

hipCUB 3.1.0 for ROCm 6.1.0

16 Apr 19:09
44aa2e0
Compare
Choose a tag to compare

Changed

  • CUB backend references CUB and Thrust version 2.1.0.
  • Updated HIPCUB_HOST_WARP_THREADS macro definition to match host_warp_size changes from rocPRIM 3.0.
  • Implemented __int128_t and __uint128_t support for radix_sort.

Fixed

  • Fixed build issues with rmake.py on Windows when using VS 2017 15.8 or later due to a breaking fix with extended aligned storage.

Added

  • Added interface DeviceMemcpy::Batched for batched memcpy from rocPRIM and CUB.

hipCUB 3.0.0 for ROCm 6.0.2

31 Jan 20:12
761fccb
Compare
Choose a tag to compare

hipCUB code for ROCm 6.0.2 did not change. The library was rebuilt for the updated ROCm 6.0.2 stack.