Skip to content
This repository was archived by the owner on May 27, 2021. It is now read-only.
This repository was archived by the owner on May 27, 2021. It is now read-only.

Can't access GPUs, get "ERROR: CUDA error: invalid device context (code 201, ERROR_INVALID_CONTEXT)" #620

Closed
@la3lma

Description

@la3lma

Hi

I just got access to a nice machine with plenty of GPUs but they don't seem to be available for Julia:


$ nvidia-smi
Fri Apr  3 14:44:44 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.64       Driver Version: 440.64       CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce RTX 208...  On   | 00000000:04:00.0 Off |                  N/A |
| 27%   24C    P8     1W / 250W |   1108MiB / 11019MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce RTX 208...  On   | 00000000:05:00.0 Off |                  N/A |
| 27%   24C    P8    21W / 250W |     11MiB / 11019MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   2  GeForce RTX 208...  On   | 00000000:06:00.0 Off |                  N/A |
| 27%   24C    P8    20W / 250W |     11MiB / 11019MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   3  GeForce RTX 208...  On   | 00000000:07:00.0 Off |                  N/A |
| 27%   25C    P8     1W / 250W |     11MiB / 11019MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   4  GeForce RTX 208...  On   | 00000000:08:00.0 Off |                  N/A |
| 27%   24C    P8    20W / 250W |     11MiB / 11019MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   5  GeForce RTX 208...  On   | 00000000:0B:00.0 Off |                  N/A |
| 27%   25C    P8    19W / 250W |     11MiB / 11019MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   6  GeForce RTX 208...  On   | 00000000:0C:00.0 Off |                  N/A |
| 27%   25C    P8    19W / 250W |     11MiB / 11019MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   7  GeForce RTX 208...  On   | 00000000:0D:00.0 Off |                  N/A |
| 27%   23C    P8    18W / 250W |     11MiB / 11019MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   8  GeForce RTX 208...  On   | 00000000:0E:00.0 Off |                  N/A |
| 27%   26C    P8    21W / 250W |     11MiB / 11019MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   9  GeForce RTX 208...  On   | 00000000:0F:00.0 Off |                  N/A |
| 27%   25C    P8     1W / 250W |     11MiB / 11019MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0     19629      C   /opt/conda/bin/python                        787MiB |
|    0     26698      C   ...geHD/userHome/rmz/julia-1.4.0/bin/julia   310MiB |
+-----------------------------------------------------------------------------+

... so there should be plenty of hardware available. Having read a few other error reports about similar issues, I also tested this:

 apt list | grep -i cupti

WARNING: apt does not have a stable CLI interface. Use with caution in scripts.

libcupti-dev/bionic 9.1.85-3ubuntu1 amd64
libcupti-doc/bionic 9.1.85-3ubuntu1 all
libcupti9.1/bionic 9.1.85-3ubuntu1 amd64

... but back to the main story and the error messages:

$ julia
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.4.0 (2020-03-21)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

julia> using CuArrays
┌ Warning: Incompatibility detected between CUDA and LLVM 8.0+; disabling debug info emission for CUDA kernels
└ @ CUDAnative ~/.julia/packages/CUDAnative/hfulr/src/CUDAnative.jl:114
[ Info: CUDAnative.jl failed to initialize, GPU functionality unavailable (set JULIA_CUDA_SILENT or JULIA_CUDA_VERBOSE to silence or expand this message)

julia> cu([1,2,3])
ERROR: CUDA error: invalid device context (code 201, ERROR_INVALID_CONTEXT)
Stacktrace:
 [1] throw_api_error(::CUDAdrv.cudaError_enum) at /storageHD/userHome/rmz/.julia/packages/CUDAdrv/b1mvw/src/error.jl:131
 [2] macro expansion at /storageHD/userHome/rmz/.julia/packages/CUDAdrv/b1mvw/src/error.jl:144 [inlined]
 [3] cuMemAlloc_v2 at /storageHD/userHome/rmz/.julia/packages/CUDAdrv/b1mvw/src/libcuda.jl:313 [inlined]
 [4] alloc(::Type{CUDAdrv.Mem.DeviceBuffer}, ::Int32) at /storageHD/userHome/rmz/.julia/packages/CUDAdrv/b1mvw/src/memory.jl:70
 [5] macro expansion at /storageHD/userHome/rmz/.julia/packages/TimerOutputs/7Id5J/src/TimerOutput.jl:228 [inlined]
 [6] macro expansion at /storageHD/userHome/rmz/.julia/packages/CuArrays/A6GUx/src/memory.jl:61 [inlined]
 [7] macro expansion at ./util.jl:234 [inlined]
 [8] actual_alloc(::Int32) at /storageHD/userHome/rmz/.julia/packages/CuArrays/A6GUx/src/memory.jl:60
 [9] actual_alloc at /storageHD/userHome/rmz/.julia/packages/CuArrays/A6GUx/src/memory/binned.jl:55 [inlined]
 [10] macro expansion at /storageHD/userHome/rmz/.julia/packages/CuArrays/A6GUx/src/memory/binned.jl:198 [inlined]
 [11] macro expansion at /storageHD/userHome/rmz/.julia/packages/TimerOutputs/7Id5J/src/TimerOutput.jl:228 [inlined]
 [12] pool_alloc(::Int32, ::Int32) at /storageHD/userHome/rmz/.julia/packages/CuArrays/A6GUx/src/memory/binned.jl:197
 [13] (::CuArrays.BinnedPool.var"#12#13"{Int32,Int32,Set{CuArrays.BinnedPool.Block},Array{CuArrays.BinnedPool.Block,1}})() at /storageHD/userHome/rmz/.julia/packages/CuArrays/A6GUx/src/memory/binned.jl:293
 [14] lock(::CuArrays.BinnedPool.var"#12#13"{Int32,Int32,Set{CuArrays.BinnedPool.Block},Array{CuArrays.BinnedPool.Block,1}}, ::ReentrantLock) at ./lock.jl:161
 [15] alloc(::Int32) at /storageHD/userHome/rmz/.julia/packages/CuArrays/A6GUx/src/memory/binned.jl:292
 [16] macro expansion at /storageHD/userHome/rmz/.julia/packages/TimerOutputs/7Id5J/src/TimerOutput.jl:228 [inlined]
 [17] macro expansion at /storageHD/userHome/rmz/.julia/packages/CuArrays/A6GUx/src/memory.jl:159 [inlined]
 [18] macro expansion at ./util.jl:234 [inlined]
 [19] alloc at /storageHD/userHome/rmz/.julia/packages/CuArrays/A6GUx/src/memory.jl:158 [inlined]
 [20] CuArray{Float32,1,P} where P(::UndefInitializer, ::Tuple{Int32}) at /storageHD/userHome/rmz/.julia/packages/CuArrays/A6GUx/src/array.jl:92
 [21] CuArray at /storageHD/userHome/rmz/.julia/packages/CuArrays/A6GUx/src/array.jl:100 [inlined]
 [22] similar at ./abstractarray.jl:671 [inlined]
 [23] convert at /storageHD/userHome/rmz/.julia/packages/GPUArrays/1wgPO/src/construction.jl:80 [inlined]
 [24] adapt_storage at /storageHD/userHome/rmz/.julia/packages/CuArrays/A6GUx/src/array.jl:239 [inlined]
 [25] adapt_structure at /storageHD/userHome/rmz/.julia/packages/Adapt/m5jFF/src/Adapt.jl:9 [inlined]
 [26] adapt at /storageHD/userHome/rmz/.julia/packages/Adapt/m5jFF/src/Adapt.jl:6 [inlined]
 [27] cu(::Array{Int32,1}) at /storageHD/userHome/rmz/.julia/packages/CuArrays/A6GUx/src/array.jl:314
 [28] top-level scope at REPL[2]:1

julia> using CUDAdrv; CUDAdrv.CuDevice(0)
CuDevice(0): GeForce RTX 2080 Ti

Following the advice to set the JULA_CUDA_VERBOSE flag, I get this result:

$  JULIA_CUDA_VERBOSE=true julia
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.4.0 (2020-03-21)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

julia> using CuArrays
┌ Warning: Incompatibility detected between CUDA and LLVM 8.0+; disabling debug info emission for CUDA kernels
└ @ CUDAnative ~/.julia/packages/CUDAnative/hfulr/src/CUDAnative.jl:114
┌ Error: CUDAnative.jl failed to initialize
│   exception =
│    Your CUDA installation does not provide libcudadevrt
│    Stacktrace:
│     [1] error(::String) at ./error.jl:33
│     [2] __init__() at /storageHD/userHome/rmz/.julia/packages/CUDAnative/hfulr/src/CUDAnative.jl:146
│     [3] _include_from_serialized(::String, ::Array{Any,1}) at ./loading.jl:697
│     [4] _require_search_from_serialized(::Base.PkgId, ::String) at ./loading.jl:781
│     [5] _tryrequire_from_serialized(::Base.PkgId, ::UInt64, ::String) at ./loading.jl:712
│     [6] _require_search_from_serialized(::Base.PkgId, ::String) at ./loading.jl:770
│     [7] _require(::Base.PkgId) at ./loading.jl:1006
│     [8] require(::Base.PkgId) at ./loading.jl:927
│     [9] require(::Module, ::Symbol) at ./loading.jl:922
│     [10] eval(::Module, ::Any) at ./boot.jl:331
│     [11] eval_user_input(::Any, ::REPL.REPLBackend) at /buildworker/worker/package_linux32/build/usr/share/julia/stdlib/v1.4/REPL/src/REPL.jl:86
│     [12] macro expansion at /buildworker/worker/package_linux32/build/usr/share/julia/stdlib/v1.4/REPL/src/REPL.jl:118 [inlined]
│     [13] (::REPL.var"#26#27"{REPL.REPLBackend})() at ./task.jl:358
└ @ CUDAnative ~/.julia/packages/CUDAnative/hfulr/src/CUDAnative.jl:190
┌ Warning: CuArrays.jl did not initialize because CUDAdrv.jl or CUDAnative.jl failed to
└ @ CuArrays ~/.julia/packages/CuArrays/A6GUx/src/CuArrays.jl:64

julia> 

... do you have any suggestions about what I should do next? It seems like the text:

exception =
│    Your CUDA installation does not provide libcudadevrt

... is at the crux of the problem, but I don't know to amend. Do you have any suggestions?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions