Documentation for rocRAND is available at https://rocm.docs.amd.com/projects/rocRAND/en/latest/
- Updated several
gfx942
auto tuning parameters.
- Fixed an issue where
mt19937.hpp
would cause kernel errors during auto tuning.
- Added extended tests to
rtest.py
. These tests are extra tests that did not fit the criteria of smoke and regression tests. These tests will take much longer to run relative to smoke and regression tests. Usepython rtest.py [--emulation|-e|--test|-t]=extended
to run these tests. - Added regression tests to
rtest.py
. These tests recreate scenarios that have caused hardware problems in past emulation environments. Usepython rtest.py [--emulation|-e|--test|-t]=regression
to run these tests. - Added smoke test options, which runs a subset of the unit tests and ensures that less than 2gb of VRAM will be used. Use
python rtest.py [--emulation|-e|--test|-t]=smoke
to run these tests. - Added
--emulation
option forrtest.py
- Removed a section in
cmake/Dependencies.cmake
that was forcingDCMAKE_CXX_COMPILER
to be set to eithercl
org++
if the compiler was notGNU
. --test|-t
is no longer a required flag forrtest.py
. Instead, the user can use either--emulation|-e
or--test|-t
, but not both.- Removed TBB dependency for multi-core processing of host-side generation.
- Fixed an issue where
CMAKE_PREFIX_PATH
was not defined properly inCMAKELists.txt
andtoolchain-linux.cmake
. - Fixed an issue in
rmake.py
wherecmake_platform_opts
was sometimes a string instead of a list.
- Added host generator for MT19937
- Support for
rocrand_generate_poisson
in hipGraphs - Added engine, distribution, mode, throughput_gigabytes_per_second, and lambda columns for the csv format in
benchmark_rocrand_host_api
andbenchmark_rocrand_device_api
. To see these new columns, set--benchmark_format=csv
or--benchmark_out_format=csv --benchmark_out="outName.csv"
.
- Updated the default value for the
-a
argument fromrmake.py
togfx906:xnack-,gfx1030,gfx1100,gfx1101,gfx1102,gfx1151,gfx1200,gfx1201
. rocrand_discrete
for MTGP32, LFSR113 and ThreeFry generators now uses the alias method, which is faster than binary search in CDF.
- Fixed an issue in
rmake.py
where the list storing cmake options would contain individual characters instead of a full string of options. - Fixed " unknown extension ?>" issue in scripts/config-tuning/select_best_config.py when using python version thats older than 3.11
- Fixed low random sequence quality of
ROCRAND_RNG_PSEUDO_THREEFRY2_64_20
andROCRAND_RNG_PSEUDO_THREEFRY4_64_20
.
- Added
rocrand_create_generator_host
- The following generators are supported:
ROCRAND_RNG_PSEUDO_MRG31K3P
ROCRAND_RNG_PSEUDO_MRG32K3A
ROCRAND_RNG_PSEUDO_PHILOX4_32_10
ROCRAND_RNG_PSEUDO_THREEFRY2_32_20
ROCRAND_RNG_PSEUDO_THREEFRY2_64_20
ROCRAND_RNG_PSEUDO_THREEFRY4_32_20
ROCRAND_RNG_PSEUDO_THREEFRY4_64_20
ROCRAND_RNG_PSEUDO_XORWOW
ROCRAND_RNG_QUASI_SCRAMBLED_SOBOL32
ROCRAND_RNG_QUASI_SCRAMBLED_SOBOL64
ROCRAND_RNG_QUASI_SOBOL32
ROCRAND_RNG_QUASI_SOBOL64
- The host-side generators support multi-core processing. On Linux, this requires the TBB (Thread Building Blocks) development package to be installed on the system when building rocRAND (
libtbb-dev
on Ubuntu and derivatives).- If TBB is not found when configuring rocRAND, the configuration is still successful, and the host generators are executed on a single CPU thread.
- The following generators are supported:
- Added the option to create a host generator to the Python wrapper
- Added the option to create a host generator to the Fortran wrapper
- Added dynamic ordering. This ordering is free to rearrange the produced numbers,
which can be specific to devices and distributions. It is implemented for:
- XORWOW, MRG32K3A, MTGP32, Philox 4x32-10, MRG31K3P, LFSR113, and ThreeFry
- For the NVIDIA platform compilation using clang as the host compiler is now supported.
- C++ wrapper:
lfsr113_engine
now also supports being constructed with a seed of typeunsigned long long
, not onlyuint4
.- added optional order parameter to constructor of
mt19937_engine
- Added the following functions for the
ROCRAND_RNG_PSEUDO_MTGP32
generator:rocrand_normal2
rocrand_normal_double2
rocrand_log_normal2
rocrand_log_normal_double2
- Added
rocrand_create_generator_host_blocking
which dispatches without stream semantics. - Added host-side generator for
ROCRAND_RNG_PSEUDO_MTGP32
. - Added offset and skipahead functionality to LFSR113 generator.
- Added dynamic ordering for architecture
gfx1102
.
- For device-side generators, you can now wrap calls to rocrand_generate_* inside of a hipGraph. There are a few
things to be aware of:
- Generator creation (rocrand_create_generator), initialization (rocrand_initialize_generator), and destruction (rocrand_destroy_generator) must still happen outside the hipGraph.
- After the generator is created, you may call API functions to set its seed, offset, and order.
- After the generator is initialized (but before stream capture or manual graph creation begins), use rocrand_set_stream to set the stream the generator will use within the graph.
- A generator's seed, offset, and stream may not be changed from within the hipGraph. Attempting to do so may result in unpredicable behaviour.
- API calls for the poisson distribution (eg. rocrand_generate_poisson) are not yet supported inside of hipGraphs.
- For sample usage, see the unit tests in test/test_rocrand_hipgraphs.cpp
- Building rocRAND now requires a C++17 capable compiler, as the internal library sources now require it. However consuming rocRAND is still possible from C++11 as public headers don't make use of the new features.
- Building rocRAND should be faster on machines with multiple CPU cores as the library has been split to multiple compilation units.
- C++ wrapper: the
min()
andmax()
member functions of the generators and distributions are nowstatic constexpr
. - Rename and unify the existing ROCRAND_DETAIL_.*_BM_NOT_IN_STATE to ROCRAND_DETAIL_BM_NOT_IN_STATE
- Static & dynamic library: moved all internal symbols to namespaces to avoid potential symbol name collisions when linking.
- Deprecated the following typedefs. Please use the unified
state_type
alias instead.rocrand_device::threefry2x32_20_engine::threefry2x32_20_state
rocrand_device::threefry2x64_20_engine::threefry2x64_20_state
rocrand_device::threefry4x32_20_engine::threefry4x32_20_state
rocrand_device::threefry4x64_20_engine::threefry4x64_20_state
- Deprecated internal header: src/rng/distribution/distributions.hpp
- Deprecated internal header: src/rng/device_engines.hpp
- Removed references to and workarounds for deprecated hcc.
- Support for HIP-CPU
- SOBOL64 and SCRAMBLED_SOBOL64 generate poisson-distributed
unsigned long long int
numbers instead ofunsigned int
. This will be fixed in the next major release.
- Added
rocrand_create_generator_host
with initial support forROCRAND_RNG_PSEUDO_PHILOX4_32_10
andROCRAND_RNG_PSEUDO_MRG31K3P
. - Added the option to create a host generator to the Python wrapper
- Added the option to create a host generator to the Fortran wrapper
- Generator classes from
rocrand.hpp
are no longer copyable (in previous versions these copies would copy internal references to the generators and would lead to double free or memory leak errors)- These types should be moved instead of copied; move constructors and operators are now defined
- Improved MT19937 initialization and generation performance
- Removed the hipRAND submodule from rocRAND; hipRAND is now only available as a separate package
- Removed references to, and workarounds for, the deprecated hcc
mt19937_engine
fromrocrand.hpp
is now move-constructible and move-assignable (the move constructor and move assignment operator was deleted for this class)- Various fixes for the C++ wrapper header
rocrand.hpp
- The name of
mrg31k3p
it is now correctly spelled (was incorrectly namedmrg31k3a
in previous versions) - Added the missing
order
setter method forthreefry4x64
- Fixed the default ordering parameter for
lfsr113
- The name of
- Build error when using Clang++ directly resulting from unsupported
amdgpu-target
references - Added hip::device as dependency to benchmark_rocrand_tuning to make it compile with amdclang++.
- Minor entropy waste in 64-bits Threefry function producing two log-normally-distributed doubles.
- MT19937 pseudo random number generator based on M. Matsumoto and T. Nishimura, 1998, Mersenne Twister: A 623-dimensionally equidistributed uniform pseudorandom number generator
- New benchmark APIs for Google Benchmark:
benchmark_rocrand_device_api
replacesbenchmark_rocrand_kernel
benchmark_curand_host_api
replacesbenchmark_curand_generate
benchmark_curand_device_api
replacesbenchmark_curand_kernel
- Experimental HIP-CPU feature
- ThreeFry pseudorandom number generator based on Salmon et al., 2011, Parallel random numbers: as easy as 1, 2, 3
- Accessor methods for SOBOL 32 and 64 direction vectors and constants:
- Enum
rocrand_direction_vector_set
to select the direction vector set rocrand_get_direction_vectors32(...)
supersedes:rocrand_h_sobol32_direction_vectors
rocrand_h_scrambled_sobol32_direction_vectors
rocrand_get_direction_vectors64(...)
supersedes:rocrand_h_sobol64_direction_vectors
rocrand_h_scrambled_sobol64_direction_vectors
rocrand_get_scramble_constants32(...)
supersedesh_scrambled_sobol32_constants
rocrand_get_scramble_constants64(...)
supersedesh_scrambled_sobol64_constants
- Enum
- Python 2.7 is no longer officially supported
- MRG31K3P pseudorandom number generator based on L'Ecuyer and Touzin, 2000, Fast combined multiple recursive generators with multipliers of the form a = ±2q ±2r
- LFSR113 pseudorandom number generator based on L'Ecuyer, 1999, Tables of maximally equidistributed combined LFSR generators
SCRAMBLED_SOBOL32
andSCRAMBLED_SOBOL64
quasirandom number generators (scrambled Sobol sequences are generated by scrambling the output of a Sobol sequence)
- The
mrg_<distribution>_distribution
structures, which provide numbers based on MRG32K3A, have been replaced bymrg_engine_<distribution>_distribution
, where<distribution>
islog_normal
,normal
,poisson
, oruniform
- These structures provide numbers for MRG31K3P (with template type
rocrand_state_mrg31k3p
) and MRG32K3A (with template typerocrand_state_mrg32k3a
)
- These structures provide numbers for MRG31K3P (with template type
- Sobol64 now returns 64-bit (instead of 32-bit) random numbers, which results in the performance of this generator being regressed
- Bug that prevented Windows code compilation in C++ mode (with a host compiler) when rocRAND headers were included
- New benchmark for the host API using Google benchmark that replaces
benchmark_rocrand_generate
, which is deprecated
- Increased the number of warmup iterations for
rocrand_benchmark_generate
from 5 to 15 to eliminate corner cases that generate artificially high benchmark scores
- Backward compatibility for
#include <rocrand.h>
(deprecated) using wrapper header files - Packages for test and benchmark executables on all supported operating systems using CPack
- Generating a random sequence of different sizes now produces the sequence without gaps,
independent of how many values are generated per call
- This is only in the case of XORWOW, MRG32K3A, PHILOX4X32_10, SOBOL32, and SOBOL64
- This is only true if the size in each call is a divisor of the distributions
output_width
due to performance - The output pointer must be aligned with
output_width * sizeof(output_type)
- hipRAND has been split into a separate package
- Header file installation location changed to match other libraries.
- When using the
rocrand.h
header file, use#include <rocrand/rocrand.h>
rather than#include <rocrand.h>
- When using the
- rocRAND still includes hipRAND using a submodule
- The rocRAND package sets the provides field with hipRAND, so projects that require hipRAND can begin to specify it
- Offset behavior for XORWOW, MRG32K3A, and PHILOX4X32_10 generator
- Setting offset now correctly generates the same sequence starting from the offset
- Only uniform
int
andfloat
will work, as these can be generated with a single call to the generator
kernel_xorwow
unit test is failing for certain GPU architectures
There are no updates for this ROCm release.
- Initial HIP on Windows support
- Packaging has been split into a runtime package (
rocrand
) and a development package (rocrand-devel
): The development package depends on the runtime package. When installing the runtime package, the package manager will suggest the installation of the development package to aid users transitioning from the previous version's combined package. This suggestion by package manager is for all supported operating systems (except CentOS 7) to aid in the transition. Thesuggestion
feature in the runtime package is introduced as a deprecated feature and will be removed in a future ROCm release.
mrg_uniform_distribution_double
is no longer generating an incorrect range of values- Order of state calls for
log_normal
,normal
, anduniform
kernel_xorwow
test is failing for certain GPU architectures
- Sobol64 support
- Benchmark time measurement improvement
- AddressSanitizer build option
- NVCC backend fix
- Fix ranges of MRG32k3a device functions
- gfx90a support
- gfx1030 support
- gfx803 supported re-enabled
- Memory leaks in Poisson tests
- Memory leaks when generator is created, but setting seed/offset/dimensions throws an exception
- The rocRAND benchmark performance drop for
xorwow
has been fixed for older ROCm builds
- Ability to force install dependencies with new
-d
flag in install script
- rocRAND package name has been updated to support newer versions of ROCm
- rocRAND benchmark performance drop
- Debug builds via the install script
There are no updates for this ROCm release.
There are no updates for this ROCm release.
There are no updates for this ROCm release.
There are no updates for this ROCm release.
- Package naming now reflects operating system name and architecture
There are no updates for this ROCm release.
- Static library build options were added in the beta; these are subject to change (build method and naming) in future releases
- HIP-Clang is now the default compiler
- HCC build