Releases · ROCm/TransferBench

28 Nov 00:12

gilbertlee-amd

v1.57.00

062b581

TransferBench v1.57.00

v1.57.00

Modified

Removing use of default starship operator / C++20 requirement to enable compilation of more OSs
Changing how version is reported. Client version is now just last two digits, and increments only if
no changes are made to the backend header-only library file, and resets to 0 when header is updated
GFX_SINGLE_TEAM=0 is set by default

Assets 2

26 Nov 20:37

gilbertlee-amd

v1.56

83fc9b3

TransferBench v1.56

v1.56

Fixed

Fixed bug when using interactive mode. Interactive mode now starts prior to all warmup iterations

Assets 2

26 Nov 20:37

gilbertlee-amd

v1.55

9f68d14

TransferBench v1.55 Pre-release

Pre-release

v1.55

Fixed

Fixed missing header error when compiling on CentOS
Fixed issues when using multi-stream mode for GFX executor

Assets 2

21 Nov 23:32

gilbertlee-amd

v1.54

02ce785

TransferBench v1.54

v1.54

Modified

Refactored TransferBench into a header-only library combined with a thin client to facilitate the
use of TransferBench as the backend for other applications
Optimized how data validation is handled - this should speed up Tests with many parallel transfers as data is only
generated once
Preset benchmarks now no longer take in any extra command line arguments. Preset settings are only accessed via
environment variables. Details for each preset are printed
The a2a preset benchmark now defaults to using fine-grained memory and GFX unroll of 2
Refactored how Transfers are launched in parallel which has reduced some CPU-side overheads
CPU and DMA executor timing now use CPU wall clock timing instead of slowest Transfer time

Added

New one2all preset which sweeps over all subests of parallel transfers from one GPU to others
Adding new warnings for DMA execution relating to how HIP will default to using agents from the source memory

Removed

CU scaling preset has been removed. Similar functionality already exists in the schmoo preset benchmark
Preparation of source data via GFX kernel has been removed (USE_PREP_KERNEL)
Removed GFX block-reordering (BLOCK_ORDER)
Removed NUM_CPU_DEVICES and NUM_GPU_DEVICES from common env vars and only into the presets they apply to.
Removed SHARED_MEM_BYTES option for GFX executor
Removed USE_PCIE_INDEX, and SHARED_MEM_BYTES

Fixed

Fixed a potential timing reporting issue when DMA executed Transfers end up getting serialized.

Assets 4

11 Nov 06:59

gilbertlee-amd

v1.53

b56d481

TransferBench v1.53

v1.53

Added

Added ability to specify NULL for sweep preset as source or destination memory type

Assets 4

09 Oct 16:49

gilbertlee-amd

v1.52

600cf13

TransferBench v1.52

Added

Added USE_HSA_DMA env var to switch to using hsa_amd_memory_async_copy instead of hipMemcpyAsync for DMA execution
Added ability to set USE_GPU_DMA env var for a2a benchmark
Adding check for large BAR enablement for GPU devices during topology check

Fixed

Potential memory leak if HSA reports 0 hops between GPUs and CPUs

Assets 4

15 Aug 17:46

gilbertlee-amd

v1.51

b30aefb

TransferBench v1.51

v1.51

Modified

CSV output has been modified slightly to match normal terminal output
Output for non single stream mode has been changed to match single stream mode (results per Executor)

Added

Support for sub-iterations via NUM_SUBITERATIONS. This allows for additional looping during an iteration
If set to 0, this should infinitely loop (which may be useful for some debug purposes)
Support for variable number of subexecutors (currently for GPU-GFX executor only). Setting subExecutors to
0 will run over a range of CUs to use, and report only the results of the best one found. This can be tuned
for performance by setting the MIN_VAR_SUBEXEC and MAX_VAR_SUBEXEC environment variables to narrow the
search space. The number of CUs used will be identical for all variable subExecutor transfers
Experimental new "healthcheck" preset config which currently only supports MI300 series. This preset runs
through CPU to GPU bandwidth tests and all-to-all XGMI bandwidth tests and compares against expected values
Pass criteria limits can be modified (due to platform differences) via the environment variables
LIMIT_UDIR (undirectional), LIMIT_BDIR (bidirectional), and LIMIT_A2A (Per GPU-GPU link bandwidth)

Fixed

Fixed out-of-bounds memory access during topology detection that can happen if the number of
CPUs is less than the number of NUMA domains
Fixed CU masking functionality on multi-XCD architectures (e.g. MI300)

Assets 2

03 Apr 16:27

gilbertlee-amd

v1.50

eaf32b4

TransferBench v1.50

Added

Adding new parallel copy preset benchmark (pcopy)
- Usage: ./TransferBench pcopy <numBytes=64M> <#CUs=8> <srcGpu=0> <minGpus=1> <maxGpus=#GPU-1>

Fixed

Removed non-copies DMA Transfers (this had previously been using hipMemset)
Fixed CPU executor when operating on null destination

Assets 6

02 Apr 22:38

gilbertlee-amd

v1.49

97fbbbb

TransferBench v1.49

Fixes

Enumerating previously missed DMA engines used only for CPU traffic in topology display

Assets 2

02 Feb 22:46

gilbertlee-amd

v1.48

aa801b9

TransferBench v1.48

v1.48

Fixes

Various fixes for TransferBenchCuda

Additions

Support for targeting specific DMA engines via executor subindex (e.g. D0.1)
Printing warnings when exeuctors are overcommited

Modifications

USE_REMOTE_READ supported for rwrite preset benchmark

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v1.57.00

Modified

v1.56

Fixed

v1.55

Fixed

v1.54

Modified

Added

Removed

Fixed

v1.53

Added

Added

Fixed

v1.51

Modified

Added

Fixed

Added

Fixed

Fixes

v1.48

Fixes

Additions

Modifications

Releases: ROCm/TransferBench

TransferBench v1.57.00

v1.57.00

Modified

TransferBench v1.56

v1.56

Fixed

TransferBench v1.55

v1.55

Fixed

TransferBench v1.54

v1.54

Modified

Added

Removed

Fixed

TransferBench v1.53

v1.53

Added

TransferBench v1.52

Added

Fixed

TransferBench v1.51

v1.51

Modified

Added

Fixed

TransferBench v1.50

Added

Fixed

TransferBench v1.49

Fixes

TransferBench v1.48

v1.48

Fixes

Additions

Modifications