-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Increase refcount to ze_loader/CUDA libraries when Level Zero/CUDA providers are used #1086
Conversation
809e6ab
to
a24a920
Compare
0e2989c
to
e75bed8
Compare
I know this is probably not easy, but could we add the simplest test for this case? |
yeah, I am working on it, that's why this PR is a draft. |
e75bed8
to
2bdefbd
Compare
63c5902
to
96624e6
Compare
42576d2
to
8246397
Compare
6b0e8e8
to
6271502
Compare
@vinser52 @bratpiorka @pbalcer This PR fixes intel/llvm#16944 |
6271502
to
7a78113
Compare
7a78113
to
664484f
Compare
Ok, new tests have been added. Actually, I split the Without my changes the |
so this PR is ready for the review. |
@lukaszstolarczuk @bratpiorka @pbalcer Hanging on Windows CUDA CI builds ... |
Description
Prior to this PR, the Level Zero provider did not
dlopen(“libze_loader.so”)
at all. We suppose that UMF’s client, who uses Level Zero provider, loadedlibze_loader.so
into the process. But what if there are two clients in the process:libze_loader.so
into the process and uses the Level Zero provider. In that case, UMF inits Level Zero symbols viadlsym
. Then the first client destroys the Level Zero Memory provider and unloads thelibze_loader.so
.libze_loader.so
into the process and uses the Level Zero provider. But our current implementation does not catch such situation and Level Zero symbols are considered initialized, but thedlsym
should be done again.This PR fixes that. The UMF acquires handle to the
libze_loader.so
viadlopen
. There is aRTLD_NOLOAD
flag that tellsdlopen
not to load the library and succeed only if the library is already loaded. It allows to increase the refcount to thelibze_loader.so
and not unload it when the first client callsdlclose
.This PR should fix the #926.
Fixes: intel/llvm#16944 (confirmed by @ldorau)
Checklist