UCT/CUDA: Runtime CUDA >= 12.3 to enable VMM #10396

tvegas1 · 2024-12-20T10:39:02Z

What?

Do not use cuCtxSetFlags() if CUDA driver does not support it.

Why?

Unresolved symbol for cuCtxSetFlags on CUDA driver < 12.1 causes crash.

How?

Assumptions:

cuCtxSetFlags is only needed for VMM, which has UCX support starting from CUDA driver >= 12.3
cuCtxSetFlags is not strictly needed for malloc async

Testing

Locally tested, needs final testing on platform with actual older drivers.

UCX_IB_GPU_DIRECT_RDMA=no ./rfs/bin/ucx_perftest -t tag_bw -m cuda

src/uct/cuda/cuda_copy/cuda_copy_md.c

yosefe · 2024-12-20T17:18:00Z

we have tests for different cuda versions, which include cuda memory hooks (for example, Test Cuda Docker ubuntu18_cuda_12_0). can we add a test that would have caught the new api usage?

tvegas1 · 2025-01-06T11:10:35Z

@yosefe, do we need this before release?

tvegas1 · 2025-01-06T13:44:29Z

we have tests for different cuda versions, which include cuda memory hooks (for example, Test Cuda Docker ubuntu18_cuda_12_0). can we add a test that would have caught the new api usage?

I think it is difficult because we need to build with later driver version and run it with older driver version. But for instance, when I run this container on rock, we are only running later driver version, and I don't think we can easily switch driver version since it has to match kernel module as per my understanding.

root@905eb7691066:/# readelf -a /usr/lib/x86_64-linux-gnu/libcuda.so | grep -w cuCtxSetFlags
   731: 00000000002516f0    30 FUNC    GLOBAL DEFAULT   13 cuCtxSetFlags

src/uct/cuda/cuda_copy/cuda_copy_md.c

yosefe · 2024-12-22T10:47:20Z

src/uct/cuda/cuda_copy/cuda_copy_md.c

    }
 #else
-    unsigned value = 1;
+    (void)ctx_set_flags_func;


why needed?
maybe we could just remove #if HAVE_CUDA_FABRIC now, since we don't use cuCtxSetFlags directly?

restored as it is needed by CU_CTX_SYNC_MEMOPS

yosefe · 2024-12-22T10:48:31Z

src/uct/cuda/cuda_copy/cuda_copy_md.c

+{
+    static ucs_status_t status = UCS_ERR_LAST;
+
+#if CUDA_VERSION >= 12000


why needed?

cuGetProcAddress() prototype changed at >=12000 and we know that cuCtxSetFlags() also appeared after 12000 so no need to use older cuGetProcAddress() prototype to check.

yosefe · 2024-12-22T10:51:26Z

src/uct/cuda/cuda_copy/cuda_copy_md.c

@@ -823,6 +834,37 @@ static uct_md_ops_t md_ops = {
    .detect_memory_type = uct_cuda_copy_md_detect_memory_type
 };

+static ucs_status_t uct_cuda_copy_md_check_is_ctx_set_flags_supported(void)


To simplify the code, we could have this function call the needed function pointer, and move the global var inside it.
Something like
ucs_status_t uct_cuda_copy_set_ctx_flags(unsigned flags)
and have it return UCS_ERR_UNSUPPORTED if the func pointer is not found.

I thought about it but went for two step approach as we need:

disable fabric at init time

set the flag with md and address as parameter, in case we cannot use cuCtxSetFlags()

src/uct/cuda/cuda_copy/cuda_copy_md.c

brminich · 2025-01-09T09:24:19Z

@yosefe, pls review

yosefe · 2025-01-09T17:03:49Z

src/uct/cuda/cuda_copy/cuda_copy_md.c

+        }
+
+        ucs_diag("disabled fabric memory allocations");
+        md->config.enable_fabric = UCS_NO;


looks like it affects only cuda_copy memory allocations, but what happens if we get a fabric memory from user buffer and then we don't actually set sync memops for it?
we could return UNSUPPORTED from uct_cuda_copy_sync_memops and if not - return error from cuda memory detection

this should show now be handled right?

src/uct/cuda/cuda_copy/cuda_copy_md.c

yosefe · 2025-01-10T16:20:53Z

src/uct/cuda/cuda_copy/cuda_copy_md.c

@@ -636,7 +667,7 @@ uct_cuda_copy_md_query_attributes(uct_cuda_copy_md_t *md, const void *address,
        return UCS_ERR_NO_DEVICE;
    }

-    uct_cuda_copy_sync_memops(md, address);
+    uct_cuda_copy_sync_memops(md, address, is_vmm);


should we call it also from cuda allocate flow?

this could end-up calling cuPointerSetAttribute twice when set flags function is not available

where would be the 2nd time? AFAIK we don't call pointer query on allocated memory (such as rndv fragments)

yosefe · 2025-01-13T11:44:20Z

src/uct/cuda/cuda_copy/cuda_copy_md.c

    CUdriverProcAddressQueryResult sym_status;
    CUresult cu_err;
    ucs_status_t status;
+    uct_cuda_cuCtxSetFlags_t cuda_cuCtxSetFlags_func =


initialized vars should be first

should be static??

thanks missed the static

yosefe · 2025-01-13T11:46:33Z

src/uct/cuda/cuda_copy/cuda_copy_md.c

@@ -553,8 +554,7 @@ static void uct_cuda_copy_sync_memops(uct_cuda_copy_md_t *md,

    if (is_vmm) {
        ucs_fatal("failed to set sync_memops on CUDA VMM without "
-                  "cuCtxSetFlags() (address=%p)",
-                  address);
+                  "cuCtxSetFlags() (address=%p)", address);


Thinking of it again it should be a warning, since failure in cuPointerSetAttribute() call is also a warning

so when is_vmm == 1 you want to call cuPointerSetAttribute and let it fail right?

moved to ucs_warn

hmm right, actually we can return from the function after ucs_warn, and not call cuPointerSetAttribute at all

yosefe · 2025-01-13T11:46:41Z

src/uct/cuda/cuda_copy/cuda_copy_md.c

@@ -636,7 +667,7 @@ uct_cuda_copy_md_query_attributes(uct_cuda_copy_md_t *md, const void *address,
        return UCS_ERR_NO_DEVICE;
    }

-    uct_cuda_copy_sync_memops(md, address);
+    uct_cuda_copy_sync_memops(md, address, is_vmm);


where would be the 2nd time? AFAIK we don't call pointer query on allocated memory (such as rndv fragments)

yosefe · 2025-01-13T16:34:51Z

src/uct/cuda/cuda_copy/cuda_copy_md.c

@@ -379,6 +431,9 @@ uct_cuda_copy_mem_alloc(uct_md_h uct_md, size_t *length_p, void **address_p,
    }

 allocated:
+    uct_cuda_copy_sync_memops(md, (void *)alloc_handle->ptr,


i wonder if it will work with MANAGED memory ... maybe only on coherent platforms that allow managed memory registration with ODP?

I would expect to work on managed, but shall I remove that line since we want to backport in v1.18?

we can add memory type check during the backport to v1.18.x

Akshay-Venkatesh · 2025-01-13T16:46:51Z

src/uct/cuda/cuda_copy/cuda_copy_md.c

+    }
+
+    if (is_vmm) {
+        ucs_warn("failed to set sync_memops on CUDA VMM without "


@tvegas1 Current changes look good to me but @yosefe brought up an issue where library is built with >=12.3 compatible driver version but the system where that library gets used has driver version < 12.1. On such a system, VMM/Mallocasync allocations are allowed (as VMM and MallocAsync is supported on driver versions < 12.1). But there would be a need to report an error or fail even if UCX isn't compiled with HAVE_CUDA_FABRIC (driver version >= 12.3). The condition met here is when UCX is built with >=12.3 driver.

agree, that's where we have VMM independently allocated, but still we don't have HAVE_FABRIC set, in this case I will move is_vmm out of the #ifdef and fatal if is_vmm == 1.

i don't think it will help - if HAVE_FABRIC is not set we will never know in UCX it is VMM memory and assume it is legacy memory. Then we can only hope that cuPointerSetAttribute would fail.

yes also enabled detect vmm because:

cuMemRelease >= 10.2

cuMemRetainAllocationHandle >= 11.0

cuMemGetAllocationPropertiesFromHandle >= 10.2

assuming we built UCX with cuda >= 11 anyways

let me know if you read it differently: https://rocm.docs.amd.com/projects/HIPIFY/en/latest/tables/CUDA_Driver_API_functions_supported_by_HIP.html

double checked actual function prototype with cuda older release online documentation

UCT/CUDA: Runtime CUDA >= 12.3 to enable VMM

68a5f51

tvegas1 force-pushed the cuda_ctx_set_flags_runtime branch from 1ce967f to 68a5f51 Compare December 20, 2024 10:46

tvegas1 requested review from Akshay-Venkatesh and yosefe December 20, 2024 10:47

rakhmets reviewed Dec 20, 2024

View reviewed changes

src/uct/cuda/cuda_copy/cuda_copy_md.c Outdated Show resolved Hide resolved

UCT/CUDA: Runtime CUDA >= 12.3 to enable VMM

9fc4430

rakhmets reviewed Dec 20, 2024

View reviewed changes

tvegas1 added 2 commits December 20, 2024 15:36

UCT/CUDA: Runtime CUDA >= 12.3 to enable VMM

e8c9f99

UCT/CUDA: Runtime CUDA >= 12.3 to enable VMM

3b43d29

UCT/CUDA: Runtime CUDA >= 12.3 to enable VMM

2161adf

yosefe reviewed Jan 6, 2025

View reviewed changes

tvegas1 force-pushed the cuda_ctx_set_flags_runtime branch from fd0d161 to f8f88ae Compare January 6, 2025 18:48

UCT/CUDA: Runtime CUDA >= 12.3 to enable VMM

6563253

tvegas1 force-pushed the cuda_ctx_set_flags_runtime branch from f8f88ae to 6563253 Compare January 6, 2025 18:49

UCT/CUDA: Runtime CUDA >= 12.3 to enable VMM

ff4313c

tvegas1 force-pushed the cuda_ctx_set_flags_runtime branch from 7acee45 to ff4313c Compare January 7, 2025 09:07

brminich previously approved these changes Jan 7, 2025

View reviewed changes

rakhmets reviewed Jan 7, 2025

View reviewed changes

src/uct/cuda/cuda_copy/cuda_copy_md.c Outdated Show resolved Hide resolved

UCT/CUDA: Runtime CUDA >= 12.3 to enable VMM

2f5e5a5

tvegas1 dismissed brminich’s stale review via 2f5e5a5 January 7, 2025 11:25

rakhmets reviewed Jan 7, 2025

View reviewed changes

src/uct/cuda/cuda_copy/cuda_copy_md.c Outdated Show resolved Hide resolved

rakhmets reviewed Jan 7, 2025

View reviewed changes

src/uct/cuda/cuda_copy/cuda_copy_md.c Outdated Show resolved Hide resolved

UCT/CUDA: Runtime CUDA >= 12.3 to enable VMM

f1601a3

rakhmets previously approved these changes Jan 7, 2025

View reviewed changes

Akshay-Venkatesh approved these changes Jan 7, 2025

View reviewed changes

brminich previously approved these changes Jan 8, 2025

View reviewed changes

yosefe reviewed Jan 10, 2025

View reviewed changes

tvegas1 dismissed stale reviews from brminich and rakhmets via 03094e5 January 10, 2025 10:21

tvegas1 force-pushed the cuda_ctx_set_flags_runtime branch from 03094e5 to da07d62 Compare January 10, 2025 10:24

UCT/CUDA: Runtime CUDA >= 12.3 to enable VMM

8657d54

tvegas1 force-pushed the cuda_ctx_set_flags_runtime branch from da07d62 to 8657d54 Compare January 10, 2025 10:33

UCT/CUDA: Runtime CUDA >= 12.3 to enable VMM

0c27f31

yosefe reviewed Jan 11, 2025

View reviewed changes

UCT/CUDA: Runtime CUDA >= 12.3 to enable VMM

eb0d1fc

yosefe reviewed Jan 13, 2025

View reviewed changes

tvegas1 added 2 commits January 13, 2025 12:37

UCT/IB/EFA/SRD: Initial interface add

fe0370b

UCT/CUDA: Runtime CUDA >= 12.3 to enable VMM

ab3d0c7

yosefe reviewed Jan 13, 2025

View reviewed changes

Akshay-Venkatesh reviewed Jan 13, 2025

View reviewed changes

tvegas1 added 2 commits January 13, 2025 17:09

UCT/CUDA: Runtime CUDA >= 12.3 to enable VMM

a0004c4

UCT/CUDA: Runtime CUDA >= 12.3 to enable VMM

81d47f0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UCT/CUDA: Runtime CUDA >= 12.3 to enable VMM #10396

UCT/CUDA: Runtime CUDA >= 12.3 to enable VMM #10396

tvegas1 commented Dec 20, 2024 •

edited

Loading

yosefe commented Dec 20, 2024

tvegas1 commented Jan 6, 2025

tvegas1 commented Jan 6, 2025

yosefe Dec 22, 2024

tvegas1 Jan 6, 2025

tvegas1 Jan 7, 2025

yosefe Dec 22, 2024

tvegas1 Jan 6, 2025

yosefe Dec 22, 2024

tvegas1 Jan 6, 2025

brminich commented Jan 9, 2025

yosefe Jan 9, 2025

tvegas1 Jan 13, 2025

yosefe Jan 10, 2025

tvegas1 Jan 13, 2025

yosefe Jan 13, 2025

tvegas1 Jan 13, 2025

yosefe Jan 13, 2025

tvegas1 Jan 13, 2025

yosefe Jan 13, 2025

tvegas1 Jan 13, 2025

tvegas1 Jan 13, 2025

yosefe Jan 13, 2025

yosefe Jan 13, 2025

yosefe Jan 13, 2025

tvegas1 Jan 13, 2025

yosefe Jan 14, 2025

Akshay-Venkatesh Jan 13, 2025

tvegas1 Jan 13, 2025

yosefe Jan 13, 2025

tvegas1 Jan 13, 2025 •

edited

Loading

tvegas1 Jan 13, 2025

tvegas1 Jan 13, 2025

UCT/CUDA: Runtime CUDA >= 12.3 to enable VMM #10396

Are you sure you want to change the base?

UCT/CUDA: Runtime CUDA >= 12.3 to enable VMM #10396

Conversation

tvegas1 commented Dec 20, 2024 • edited Loading

What?

Why?

How?

Testing

yosefe commented Dec 20, 2024

tvegas1 commented Jan 6, 2025

tvegas1 commented Jan 6, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

brminich commented Jan 9, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tvegas1 Jan 13, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tvegas1 commented Dec 20, 2024 •

edited

Loading

tvegas1 Jan 13, 2025 •

edited

Loading