Increase refcount to ze_loader/CUDA libraries when Level Zero/CUDA providers are used #1086

vinser52 · 2025-02-07T17:24:33Z

Description

Prior to this PR, the Level Zero provider did not dlopen(“libze_loader.so”) at all. We suppose that UMF’s client, who uses Level Zero provider, loaded libze_loader.so into the process. But what if there are two clients in the process:

The first client loads libze_loader.so into the process and uses the Level Zero provider. In that case, UMF inits Level Zero symbols via dlsym. Then the first client destroys the Level Zero Memory provider and unloads the libze_loader.so.
Then the second client loads libze_loader.so into the process and uses the Level Zero provider. But our current implementation does not catch such situation and Level Zero symbols are considered initialized, but the dlsym should be done again.

This PR fixes that. The UMF acquires handle to the libze_loader.so via dlopen. There is a RTLD_NOLOAD flag that tells dlopen not to load the library and succeed only if the library is already loaded. It allows to increase the refcount to the libze_loader.so and not unload it when the first client calls dlclose.

This PR should fix the #926.

Fixes: intel/llvm#16944 (confirmed by @ldorau)

Checklist

Code compiles without errors locally
All tests pass locally
CI workflows execute properly
New tests added, especially if they will fail without my changes
All newly added source files have a license
All newly added source files are referenced in CMake files

bratpiorka · 2025-02-10T09:24:32Z

I know this is probably not easy, but could we add the simplest test for this case?

vinser52 · 2025-02-10T10:21:43Z

I know this is probably not easy, but could we add the simplest test for this case?

yeah, I am working on it, that's why this PR is a draft.
Also, I will take a look into the CUDA provider, I believe it requires the same changes.

src/provider/provider_level_zero.c

src/provider/provider_cuda.c

ldorau · 2025-02-14T13:39:27Z

@vinser52 @bratpiorka @pbalcer This PR fixes intel/llvm#16944

vinser52 · 2025-02-14T13:58:46Z

I know this is probably not easy, but could we add the simplest test for this case?

Ok, new tests have been added. Actually, I split the umf-provider_level_zero_dlopen and umf-provider_cuda_dlopen to the umf-provider_level_zero_dlopen_global/umf-provider_level_zero_dlopen_local and umf-provider_cuda_dlopen_global/umf-provider_cuda_dlopen_local tests correspondingly.

Without my changes the umf-provider_level_zero_dlopen_local and umf-provider_cuda_dlopen_local tests are failed.

vinser52 · 2025-02-14T13:59:08Z

@vinser52 @bratpiorka @pbalcer This PR fixes intel/llvm#16944

so this PR is ready for the review.

ldorau · 2025-02-14T14:40:01Z

@lukaszstolarczuk @bratpiorka @pbalcer Hanging on Windows CUDA CI builds ...

src/provider/provider_level_zero.c

vinser52 requested a review from igchor February 7, 2025 17:25

vinser52 force-pushed the svinogra_l0_linking branch from 809e6ab to a24a920 Compare February 7, 2025 17:33

igchor mentioned this pull request Feb 7, 2025

[CTS] add UMF integration test oneapi-src/unified-runtime#2677

Draft

vinser52 force-pushed the svinogra_l0_linking branch 2 times, most recently from 0e2989c to e75bed8 Compare February 8, 2025 00:20

bratpiorka added this to the v0.11.x milestone Feb 10, 2025

vinser52 force-pushed the svinogra_l0_linking branch from e75bed8 to 2bdefbd Compare February 10, 2025 13:38

vinser52 changed the title ~~Increase refcount to ze_loader library when Level Zero provider is used~~ Increase refcount to ze_loader/CUDA libraries when Level Zero/CUDA providers are used Feb 10, 2025

vinser52 force-pushed the svinogra_l0_linking branch 6 times, most recently from 63c5902 to 96624e6 Compare February 11, 2025 21:26

vinser52 mentioned this pull request Feb 11, 2025

Refactor Level Zero and CUDA tests #1094

Merged

3 tasks

vinser52 force-pushed the svinogra_l0_linking branch 13 times, most recently from 42576d2 to 8246397 Compare February 13, 2025 15:40

vinser52 mentioned this pull request Feb 13, 2025

Do not overwrite ret code in level_zero_shared_memory example #1098

Merged

3 tasks

vinser52 force-pushed the svinogra_l0_linking branch 4 times, most recently from 6b0e8e8 to 6271502 Compare February 13, 2025 22:00

PatKamin reviewed Feb 14, 2025

View reviewed changes

src/provider/provider_level_zero.c Outdated Show resolved Hide resolved

src/provider/provider_level_zero.c Outdated Show resolved Hide resolved

src/provider/provider_cuda.c Outdated Show resolved Hide resolved

Increase refcount to ze_loader library when Level Zero provider is used

1db4c48

vinser52 force-pushed the svinogra_l0_linking branch from 6271502 to 7a78113 Compare February 14, 2025 13:52

vinser52 added 4 commits February 14, 2025 14:54

Increase refcount to CUDA library when CUDA provider is used

c715e62

Fix LD_LIBRARY_PATH for tests that use libze_loader

3b738d5

Test Level Zero provider when ze_loader is opened with RTLD_LOCAL

b0dd2fc

Test CUDA provider when cuda is opened with RTLD_LOCAL

664484f

vinser52 force-pushed the svinogra_l0_linking branch from 7a78113 to 664484f Compare February 14, 2025 13:55

vinser52 marked this pull request as ready for review February 14, 2025 13:55

vinser52 requested a review from a team as a code owner February 14, 2025 13:55

vinser52 requested review from PatKamin, ldorau and bratpiorka February 14, 2025 13:59

ldorau approved these changes Feb 14, 2025

View reviewed changes

ldorau mentioned this pull request Feb 14, 2025

[SYCL][CUDA] Nsys profiling broken after memory providers change intel/llvm#16944

Closed

igchor reviewed Feb 14, 2025

View reviewed changes

src/provider/provider_level_zero.c Show resolved Hide resolved

igchor approved these changes Feb 14, 2025

View reviewed changes

ldorau merged commit 5a515c5 into oneapi-src:main Feb 17, 2025
77 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Increase refcount to ze_loader/CUDA libraries when Level Zero/CUDA providers are used #1086

Increase refcount to ze_loader/CUDA libraries when Level Zero/CUDA providers are used #1086

vinser52 commented Feb 7, 2025 •

edited

Loading

bratpiorka commented Feb 10, 2025

vinser52 commented Feb 10, 2025

ldorau commented Feb 14, 2025

vinser52 commented Feb 14, 2025

vinser52 commented Feb 14, 2025

ldorau commented Feb 14, 2025

Increase refcount to ze_loader/CUDA libraries when Level Zero/CUDA providers are used #1086

Increase refcount to ze_loader/CUDA libraries when Level Zero/CUDA providers are used #1086

Conversation

vinser52 commented Feb 7, 2025 • edited Loading

Description

Checklist

bratpiorka commented Feb 10, 2025

vinser52 commented Feb 10, 2025

ldorau commented Feb 14, 2025

vinser52 commented Feb 14, 2025

vinser52 commented Feb 14, 2025

ldorau commented Feb 14, 2025

vinser52 commented Feb 7, 2025 •

edited

Loading