-
-
Notifications
You must be signed in to change notification settings - Fork 51
Can't access GPUs, get "ERROR: CUDA error: invalid device context (code 201, ERROR_INVALID_CONTEXT)" #620
Comments
Indeed, once a part of the stack fails to initialize all subsequent errors are meaningless except for the reason it failed to initialize in the first place. You should have a libcudadevrt somewhere, if not your installation is broken or missing pieces. Do you have that file somewhere? If so, CUDAapi.jl is not detecting it; try running with JULIA_DEBUG=CUDAapi. Last-resort, you could also upgrade and use the latest CuArrays/CUDAnative which install CUDA automatically using artifacts. |
Thank you for your reply. Here is some more information, including what you suggested, but also some commands showing where the libcudadevrt library is located:
.... seems like it's not looking in the lib64 directory. Is there an easy way to instruct Julia to do that? ... or should I ask my admin to add a symbolic link from lib to lib64 (may be dangerous for other systems on that box, I don't know). |
More data: I tried making a fake cuda home with symbolic links going to the real one, and added a link for the "lib" going to "lib64". That got me one step ahead, now blocking on. "libcublas.so: wrong ELF class: ELFCLASS64".
|
... and it's resolved. The bug was: I had (inadvertently, but there you are) installed the 32 bit in julia and the system was a 64 bit system. :-) |
Ah yeah, CUDA doesn't even support 32-bits anymore so we can't even support that use case with artifacts. Glad you got it fixed! |
Hi
I just got access to a nice machine with plenty of GPUs but they don't seem to be available for Julia:
... so there should be plenty of hardware available. Having read a few other error reports about similar issues, I also tested this:
... but back to the main story and the error messages:
Following the advice to set the JULA_CUDA_VERBOSE flag, I get this result:
... do you have any suggestions about what I should do next? It seems like the text:
... is at the crux of the problem, but I don't know to amend. Do you have any suggestions?
The text was updated successfully, but these errors were encountered: