Skip to content

Tags: NVIDIA/nccl-tests

Tags

v2.14.1

Add PCI domain and device ID for GPU device BDF display

v2.14.0

Perftests: Introduce NCCL_TESTS_SPLIT env

`NCCL_TESTS_SPLIT` serves as new way of computing the color for splitting communicators.

Will be overrided by `NCCL_TESTS_SPLIT_MASK`.

Examples:

NCCL_TESTS_SPLIT_MASK="0x7" # color = rank & 0x7. What we do today to run on a DGX with one GPU per node.
NCCL_TESTS_SPLIT="AND 0x7"  # color = rank & 0x7. New way to run on one GPU per node on a DGX, equivalent to NCCL_TESTS_SPLIT_MASK=0x7
NCCL_TESTS_SPLIT="MOD 72"   # color = rank % 72.  One GPU per NVLink domain on an NVL72 system.
NCCL_TESTS_SPLIT="DIV 72"   # color = rank / 72.  Intra NVLink domain on NVL72.

You can also use: "%" "&" "|" "/" for short.
Extra spaces in the middle will be automatically ignored.
Not case sensitive.

The followings are all equivalent:

NCCL_TESTS_SPLIT="%0x7"
NCCL_TESTS_SPLIT="%0b111"
NCCL_TESTS_SPLIT="AND 7"
NCCL_TESTS_SPLIT="and 0x7"

v2.13.13

Update CUDA gencodes

Add support for Blackwell sm100 and sm120 from CUDA 12.8

Add support for Hopper sm90 from CUDA 12.0

v2.13.12

Fixes to all tests that divide buffers by nranks so that they trim bu…

…ffer sizes to be multiples of 16 bytes.

This ensures non-pow2 ranks have buffer addresses aligned suitably for performance.

v2.13.11

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Merge pull request #259 from NVIDIA/fix-ncclstringtotype

Future-proof ncclstringtotype

v2.13.10

Added -N,--run_cycles option

v2.13.9

Added missing MPI_Comm_free() call before MPI_Finalize()

v2.13.8

Added an MPI_Barrier() call after MPI_Bcast() for HCOLL issue

v2.13.7

Make the -c option be a datacheck iteration count parameter

Default is 1

v2.13.6

Add boot_id to the hostname hash due to collisions on Azure

Fixes #60