Skip to content

InvalidIRError in CUDA integration tests on non-CUDA machines #614

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
giordano opened this issue Jan 25, 2025 · 9 comments
Open

InvalidIRError in CUDA integration tests on non-CUDA machines #614

giordano opened this issue Jan 25, 2025 · 9 comments
Labels
bug Something isn't working

Comments

@giordano
Copy link
Member

giordano commented Jan 25, 2025

After the upgrade to Julia v1.11.3 we're getting

Square Kernel: Error During Test at /home/runner/work/Reactant.jl/Reactant.jl/test/integration/cuda.jl:26
  Got exception outside of a @test
  InvalidIRError: compiling MethodInstance for Main.var"##CUDA#245".square_kernel!(::ReactantCUDAExt.CuTracedArray{Int64, 1, 1, (64,)}, ::ReactantCUDAExt.CuTracedArray{Int64, 1, 1, (64,)}) resulted in invalid LLVM IR
  Reason: unsupported dynamic function invocation (call to throw_boundserror() @ CUDA ~/.julia/packages/CUDA/1kIOw/src/device/quirks.jl:51)
  Stacktrace:
   [1] #throw_boundserror
     @ ~/.julia/packages/CUDA/1kIOw/src/device/quirks.jl:53
   [2] checkbounds
     @ ./abstractarray.jl:699
   [3] #arrayref
     @ ~/work/Reactant.jl/Reactant.jl/ext/ReactantCUDAExt.jl:62
   [4] getindex
     @ ~/work/Reactant.jl/Reactant.jl/ext/ReactantCUDAExt.jl:151
   [5] square_kernel!
     @ ~/work/Reactant.jl/Reactant.jl/test/integration/cuda.jl:8

See e.g. Julia 1.11 - integration - ubuntu-20.04 - x64 - packaged libReactant - assertions=false - push

I'm going to skip these tests to keep CI greener, but this issue still needs to be resolved.

@giordano giordano added the bug Something isn't working label Jan 25, 2025
@wsmoses
Copy link
Member

wsmoses commented Jan 26, 2025

@vchuravy @maleadt I'm not sure why but I think CUDA.jl on 1.11 is hitting an error (in its error handling).

In particular Julia is deciding to emit a type unstable invoke to CUDA.throw_boundserror() [no arguments], instead of inlining it and enabling the subsequent LLVM code to be converted to an exit asm instruction.

# Using Julia 1.11

using CUDA, Reactant, Test, Revise, GPUCompiler, Cthulhu

using Reactant
using Test
using CUDA

function square_kernel!(x, y)
    i = threadIdx().x
    x[i] *= y[i]
    sync_threads()
    return nothing
end

# basic squaring on GPU
function square!(x, y)
    @cuda blocks = 1 threads = length(x) square_kernel!(x, y)
    return nothing
end


oA = collect(1:1:64); 
A = Reactant.to_rarray(oA); 
B = Reactant.to_rarray(100 .* oA); 
@jit square!(A, B)
a = err
"""
1-element ExceptionStack:
InvalidIRError: compiling MethodInstance for square_kernel!(::ReactantCUDAExt.CuTracedArray{Int64, 1, 1, (64,)}, ::ReactantCUDAExt.CuTracedArray{Int64, 1, 1, (64,)}) resulted in invalid LLVM IR
Reason: unsupported dynamic function invocation (call to throw_boundserror() @ CUDA ~/.julia/packages/CUDA/1kIOw/src/device/quirks.jl:51)
Stacktrace:
 [1] #throw_boundserror
   @ ~/.julia/packages/CUDA/1kIOw/src/device/quirks.jl:53
 [2] checkbounds
   @ ./abstractarray.jl:699
 [3] #arrayref
   @ ~/git/Reactant.jl/ext/ReactantCUDAExt.jl:62
 [4] getindex
   @ ~/git/Reactant.jl/ext/ReactantCUDAExt.jl:151
 [5] square_kernel!
   @ ./REPL[5]:3
Hint: catch this exception as `err` and call `code_typed(err; interactive = true)` to introspect the erronous code with Cthulhu.jl
Stacktrace:
  [1] (::ReactantCUDAExt.var"#7#10"{CompilerJob{PTXCompilerTarget, CUDA.CUDACompilerParams}})(ctx::LLVM.Context)
    @ ReactantCUDAExt ~/git/Reactant.jl/ext/ReactantCUDAExt.jl:402
  [2] JuliaContext(f::ReactantCUDAExt.var"#7#10"{CompilerJob{PTXCompilerTarget, CUDA.CUDACompilerParams}}; kwargs::@Kwargs{})
    @ GPUCompiler ~/.julia/packages/GPUCompiler/Nxf8r/src/driver.jl:34
  [3] JuliaContext
    @ ~/.julia/packages/GPUCompiler/Nxf8r/src/driver.jl:25 [inlined]
  [4] compile(job::CompilerJob{PTXCompilerTarget, CUDA.CUDACompilerParams})
    @ ReactantCUDAExt ~/git/Reactant.jl/ext/ReactantCUDAExt.jl:338
  [5] actual_compilation(cache::Dict{Any, ReactantCUDAExt.LLVMFunc}, src::Core.MethodInstance, world::UInt64, cfg::CompilerConfig{PTXCompilerTarget, CUDA.CUDACompilerParams}, compiler::typeof(ReactantCUDAExt.compile), linker::typeof(ReactantCUDAExt.link))
    @ GPUCompiler ~/.julia/packages/GPUCompiler/Nxf8r/src/execution.jl:237
  [6] cached_compilation(cache::Dict{Any, ReactantCUDAExt.LLVMFunc}, src::Core.MethodInstance, cfg::CompilerConfig{PTXCompilerTarget, CUDA.CUDACompilerParams}, compiler::Function, linker::Function)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/Nxf8r/src/execution.jl:151
  [7] macro expansion
    @ ~/git/Reactant.jl/ext/ReactantCUDAExt.jl:753 [inlined]
  [8] macro expansion
    @ ./lock.jl:273 [inlined]
  [9] cufunction(f::typeof(square_kernel!), tt::Type{Tuple{ReactantCUDAExt.CuTracedArray{Int64, 1, 1, (64,)}, ReactantCUDAExt.CuTracedArray{Int64, 1, 1, (64,)}}}; kwargs::@Kwargs{})
    @ ReactantCUDAExt ~/git/Reactant.jl/ext/ReactantCUDAExt.jl:731
 [10] #cufunction
    @ ~/git/Reactant.jl/ext/ReactantCUDAExt.jl:728 [inlined]
 [11] cufunction(none::typeof(square_kernel!), none::Type{Tuple{ReactantCUDAExt.CuTracedArray{Int64, 1, 1, (64,)}, ReactantCUDAExt.CuTracedArray{Int64, 1, 1, (64,)}}})
    @ Reactant ./<missing>:0
 [12] #cufunction
    @ ~/git/Reactant.jl/ext/ReactantCUDAExt.jl:728 [inlined]
 [13] call_with_reactant(::typeof(cufunction), ::typeof(square_kernel!), ::Type{Tuple{ReactantCUDAExt.CuTracedArray{Int64, 1, 1, (64,)}, ReactantCUDAExt.CuTracedArray{Int64, 1, 1, (64,)}}})
    @ Reactant ~/git/Reactant.jl/src/utils.jl:0
 [14] macro expansion
    @ ~/.julia/packages/CUDA/1kIOw/src/compiler/execution.jl:112 [inlined]
 [15] square!
    @ ./REPL[6]:2 [inlined]
 [16] square!(none::Reactant.TracedRArray{Int64, 1}, none::Reactant.TracedRArray{Int64, 1})
    @ Reactant ./<missing>:0
 [17] macro expansion
    @ ~/.julia/packages/CUDA/1kIOw/src/compiler/execution.jl:108 [inlined]
 [18] square!
    @ ./REPL[6]:2 [inlined]
 [19] call_with_reactant(::typeof(square!), ::Reactant.TracedRArray{Int64, 1}, ::Reactant.TracedRArray{Int64, 1})
    @ Reactant ~/git/Reactant.jl/src/utils.jl:0
 [20] make_mlir_fn(f::Function, args::Tuple{ConcreteRArray{Int64, 1}, ConcreteRArray{Int64, 1}}, kwargs::Tuple{}, name::String, concretein::Bool; toscalar::Bool, return_dialect::Symbol, do_transpose::Bool, no_args_in_result::Bool)
    @ Reactant.TracedUtils ~/git/Reactant.jl/src/TracedUtils.jl:216
 [21] make_mlir_fn
    @ ~/git/Reactant.jl/src/TracedUtils.jl:129 [inlined]
 [22] compile_mlir!(mod::Reactant.MLIR.IR.Module, f::Function, args::Tuple{ConcreteRArray{Int64, 1}, ConcreteRArray{Int64, 1}}; optimize::Bool, no_nan::Bool)
    @ Reactant.Compiler ~/git/Reactant.jl/src/Compiler.jl:441
 [23] compile_mlir!
    @ ~/git/Reactant.jl/src/Compiler.jl:432 [inlined]
 [24] compile_xla(f::Function, args::Tuple{ConcreteRArray{Int64, 1}, ConcreteRArray{Int64, 1}}; client::Nothing, optimize::Bool, no_nan::Bool, device::Nothing)
    @ Reactant.Compiler ~/git/Reactant.jl/src/Compiler.jl:982
 [25] compile_xla
    @ ~/git/Reactant.jl/src/Compiler.jl:972 [inlined]
 [26] compile(f::Function, args::Tuple{ConcreteRArray{Int64, 1}, ConcreteRArray{Int64, 1}}; sync::Bool, kwargs::@Kwargs{client::Nothing, no_nan::Bool, device::Nothing, optimize::Bool})
    @ Reactant.Compiler ~/git/Reactant.jl/src/Compiler.jl:1047
 [27] top-level scope
    @ ~/git/Reactant.jl/src/Compiler.jl:695
"""
code_typed(a.stack[1].exception, interactive=true)

"""
julia> code_typed(a.stack[1].exception, interactive=true)
square_kernel!(x, y) @ Main REPL[18]:1
  ∘ ── %0 = invoke square_kernel!(::ReactantCUDAExt.CuTracedArray{Int64, 1, 1, (64,)},::ReactantCUDAExt.CuTracedArray{Int64, 1, 1, (64,)})::Core.Const(nothing)       
2 1 ── %1  = Core.tuple("; ModuleID = 'llvmcall'\nsource_filename = \"llvmcall\"\n\n; Function Attrs: alwaysinline\ndefine i32 @entry() #0 {\nentry:\n  %0 = call i32 @llvm.nvvm.read.ptx.sreg.tid.x(), !range !0\n  ret i32 %0\n}\n\n; Function Attrs: nocallback nofree nosync nounwind speculatable willreturn memory(none)\ndeclare i32 @llvm.nvvm.read.ptx.sreg.tid.x() #1\n\nattributes #0 = { alwaysinline }\nattributes #1 = { nocallback nofree nosync nounwind speculatable willreturn memory(none) }\n\n!0 = !{i32 0, i32 1023}\n", "entry")::Core.Const(("; ModuleID = 'llvmcall'\nsource_filename = \"llvmcall\"\n\n; Function Attrs: alwaysinline\ndefine i32 @entry() #0 {\nentry:\n  %0 = call i32 @llvm.nvvm.read.ptx.sreg.tid.x(), !range !0\n  ret i32 %0\n}\n\n; Function Attrs: nocallback nofree nosync nounwind speculatable willreturn memory(none)\ndeclare i32 @llvm.nvvm.read.ptx.sreg.tid.x() #1\n\nattributes #0 = { alwaysinline }\nattributes #1 = { nocallback nofree nosync nounwind speculatable willreturn memory(none) }\n\n!0 = !{i32 0, i32 1023}\n", "entry"))
  │    %2  = Base.llvmcall(%1, Int32, Tuple{})::Int32                                                                                                      ││┃│││       threadIdx_x
  │    %3  = Base.add_int(%2, 1)::Int32                                                                                                                    │││╻          +
  │    %4  = Core.tuple("; ModuleID = 'llvmcall'\nsource_filename = \"llvmcall\"\n\n; Function Attrs: alwaysinline\ndefine i32 @entry() #0 {\nentry:\n  %0 = call i32 @llvm.nvvm.read.ptx.sreg.tid.y(), !range !0\n  ret i32 %0\n}\n\n; Function Attrs: nocallback nofree nosync nounwind speculatable willreturn memory(none)\ndeclare i32 @llvm.nvvm.read.ptx.sreg.tid.y() #1\n\nattributes #0 = { alwaysinline }\nattributes #1 = { nocallback nofree nosync nounwind speculatable willreturn memory(none) }\n\n!0 = !{i32 0, i32 1023}\n", "entry")::Core.Const(("; ModuleID = 'llvmcall'\nsource_filename = \"llvmcall\"\n\n; Function Attrs: alwaysinline\ndefine i32 @entry() #0 {\nentry:\n  %0 = call i32 @llvm.nvvm.read.ptx.sreg.tid.y(), !range !0\n  ret i32 %0\n}\n\n; Function Attrs: nocallback nofree nosync nounwind speculatable willreturn memory(none)\ndeclare i32 @llvm.nvvm.read.ptx.sreg.tid.y() #1\n\nattributes #0 = { alwaysinline }\nattributes #1 = { nocallback nofree nosync nounwind speculatable willreturn memory(none) }\n\n!0 = !{i32 0, i32 1023}\n", "entry"))
  │          Base.llvmcall(%4, Int32, Tuple{})::Int32                                                                                                      ││││┃│         macro expansion
  │    %6  = Core.tuple("; ModuleID = 'llvmcall'\nsource_filename = \"llvmcall\"\n\n; Function Attrs: alwaysinline\ndefine i32 @entry() #0 {\nentry:\n  %0 = call i32 @llvm.nvvm.read.ptx.sreg.tid.z(), !range !0\n  ret i32 %0\n}\n\n; Function Attrs: nocallback nofree nosync nounwind speculatable willreturn memory(none)\ndeclare i32 @llvm.nvvm.read.ptx.sreg.tid.z() #1\n\nattributes #0 = { alwaysinline }\nattributes #1 = { nocallback nofree nosync nounwind speculatable willreturn memory(none) }\n\n!0 = !{i32 0, i32 63}\n", "entry")::Core.Const(("; ModuleID = 'llvmcall'\nsource_filename = \"llvmcall\"\n\n; Function Attrs: alwaysinline\ndefine i32 @entry() #0 {\nentry:\n  %0 = call i32 @llvm.nvvm.read.ptx.sreg.tid.z(), !range !0\n  ret i32 %0\n}\n\n; Function Attrs: nocallback nofree nosync nounwind speculatable willreturn memory(none)\ndeclare i32 @llvm.nvvm.read.ptx.sreg.tid.z() #1\n\nattributes #0 = { alwaysinline }\nattributes #1 = { nocallback nofree nosync nounwind speculatable willreturn memory(none) }\n\n!0 = !{i32 0, i32 63}\n", "entry"))
  │          Base.llvmcall(%6, Int32, Tuple{})::Int32                                                                                                      ││││┃│         macro expansion
3 │    %8  = $(Expr(:boundscheck, true))::Bool                                                                                                             │╻╷         getindex
  └───       goto #4 if not %8                                                                                                                             ││┃          #arrayref
  2 ── %10 = Core.sext_int(Core.Int64, %3)::Int64                                                                                                          │││╻╷╷╷╷╷╷╷   checkbounds
  │    %11 = Base.sle_int(1, %10)::Bool                                                                                                                    ││││╻          checkbounds
  │    %12 = Core.sext_int(Core.Int64, %3)::Int64                                                                                                          │││││╻╷╷╷╷      checkindex
  │    %13 = Base.sle_int(%12, 64)::Bool                                                                                                                   ││││││╻          <=
  │    %14 = Base.and_int(%11, %13)::Bool                                                                                                                  ││││││╻          &
  └───       goto #11 if not %14                                                                                                                           ││││       
  3 ──       nothing::Nothing                                                                                                                              │          
  4 ┄─ %17 = Base.getfield(x, :ptr)::Core.LLVMPtr{Int64, 1}                                                                                                │││╻╷╷╷       arrayref_bits
  │    %18 = Core.tuple("; ModuleID = 'llvmcall'\nsource_filename = \"llvmcall\"\n\n; Function Attrs: alwaysinline\ndefine i64 @entry(i8 addrspace(1)* %0, i32 %1) #0 {\nentry:\n  %2 = bitcast i8 addrspace(1)* %0 to i64 addrspace(1)*\n  %3 = getelementptr inbounds i64, i64 addrspace(1)* %2, i32 %1\n  %4 = load i64, i64 addrspace(1)* %3, align 8, !tbaa !0\n  ret i64 %4\n}\n\nattributes #0 = { alwaysinline }\n\n!0 = !{!1, !1, i64 0, i64 0}\n!1 = !{!\"custom_tbaa_addrspace(1)\", !2, i64 0}\n!2 = !{!\"custom_tbaa\"}\n", "entry")::Core.Const(("; ModuleID = 'llvmcall'\nsource_filename = \"llvmcall\"\n\n; Function Attrs: alwaysinline\ndefine i64 @entry(i8 addrspace(1)* %0, i32 %1) #0 {\nentry:\n  %2 = bitcast i8 addrspace(1)* %0 to i64 addrspace(1)*\n  %3 = getelementptr inbounds i64, i64 addrspace(1)* %2, i32 %1\n  %4 = load i64, i64 addrspace(1)* %3, align 8, !tbaa !0\n  ret i64 %4\n}\n\nattributes #0 = { alwaysinline }\n\n!0 = !{!1, !1, i64 0, i64 0}\n!1 = !{!\"custom_tbaa_addrspace(1)\", !2, i64 0}\n!2 = !{!\"custom_tbaa\"}\n", "entry"))
  │    %19 = Base.sub_int(%3, 1)::Int32                                                                                                                    │││││┃│││       pointerref
  │    %20 = Base.llvmcall(%18, Int64, Tuple{Core.LLVMPtr{Int64, 1}, Int32}, %17, %19)::Int64                                                              ││││││┃│         macro expansion
  │    %21 = $(Expr(:boundscheck, true))::Bool                                                                                                             ││╻          #arrayref
  └───       goto #7 if not %21                                                                                                                            │││        
  5 ── %23 = Core.sext_int(Core.Int64, %3)::Int64                                                                                                          │││╻╷╷╷╷╷╷╷   checkbounds
  │    %24 = Base.sle_int(1, %23)::Bool                                                                                                                    ││││╻          checkbounds
  │    %25 = Core.sext_int(Core.Int64, %3)::Int64                                                                                                          │││││╻╷╷╷╷      checkindex
  │    %26 = Base.sle_int(%25, 64)::Bool                                                                                                                   ││││││╻          <=
  │    %27 = Base.and_int(%24, %26)::Bool                                                                                                                  ││││││╻          &
  └───       goto #12 if not %27                                                                                                                           ││││       
  6 ──       nothing::Nothing                                                                                                                              │          
  7 ┄─ %30 = Base.getfield(y, :ptr)::Core.LLVMPtr{Int64, 1}                                                                                                │││╻╷╷╷       arrayref_bits
  │    %31 = Core.tuple("; ModuleID = 'llvmcall'\nsource_filename = \"llvmcall\"\n\n; Function Attrs: alwaysinline\ndefine i64 @entry(i8 addrspace(1)* %0, i32 %1) #0 {\nentry:\n  %2 = bitcast i8 addrspace(1)* %0 to i64 addrspace(1)*\n  %3 = getelementptr inbounds i64, i64 addrspace(1)* %2, i32 %1\n  %4 = load i64, i64 addrspace(1)* %3, align 8, !tbaa !0\n  ret i64 %4\n}\n\nattributes #0 = { alwaysinline }\n\n!0 = !{!1, !1, i64 0, i64 0}\n!1 = !{!\"custom_tbaa_addrspace(1)\", !2, i64 0}\n!2 = !{!\"custom_tbaa\"}\n", "entry")::Core.Const(("; ModuleID = 'llvmcall'\nsource_filename = \"llvmcall\"\n\n; Function Attrs: alwaysinline\ndefine i64 @entry(i8 addrspace(1)* %0, i32 %1) #0 {\nentry:\n  %2 = bitcast i8 addrspace(1)* %0 to i64 addrspace(1)*\n  %3 = getelementptr inbounds i64, i64 addrspace(1)* %2, i32 %1\n  %4 = load i64, i64 addrspace(1)* %3, align 8, !tbaa !0\n  ret i64 %4\n}\n\nattributes #0 = { alwaysinline }\n\n!0 = !{!1, !1, i64 0, i64 0}\n!1 = !{!\"custom_tbaa_addrspace(1)\", !2, i64 0}\n!2 = !{!\"custom_tbaa\"}\n", "entry"))
  │    %32 = Base.sub_int(%3, 1)::Int32                                                                                                                    │││││┃│││       pointerref
  │    %33 = Base.llvmcall(%31, Int64, Tuple{Core.LLVMPtr{Int64, 1}, Int32}, %30, %32)::Int64                                                              ││││││┃│         macro expansion
  │    %34 = Base.mul_int(%20, %33)::Int64                                                                                                                 │╻          *
  │    %35 = $(Expr(:boundscheck, true))::Bool                                                                                                             ││╻          #arrayset
  └───       goto #10 if not %35                                                                                                                           │││        
  8 ── %37 = Core.sext_int(Core.Int64, %3)::Int64                                                                                                          │││╻╷╷╷╷╷╷╷   checkbounds
  │    %38 = Base.sle_int(1, %37)::Bool                                                                                                                    ││││╻          checkbounds
  │    %39 = Core.sext_int(Core.Int64, %3)::Int64                                                                                                          │││││╻╷╷╷╷      checkindex
  │    %40 = Base.sle_int(%39, 64)::Bool                                                                                                                   ││││││╻          <=
  │    %41 = Base.and_int(%38, %40)::Bool                                                                                                                  ││││││╻          &
  └───       goto #13 if not %41                                                                                                                           ││││       
  9 ──       nothing::Nothing                                                                                                                              │          
  10 ┄ %44 = Base.getfield(x, :ptr)::Core.LLVMPtr{Int64, 1}                                                                                                │││╻╷╷╷       arrayset_bits
  │    %45 = Core.tuple("; ModuleID = 'llvmcall'\nsource_filename = \"llvmcall\"\n\n; Function Attrs: alwaysinline\ndefine void @entry(i8 addrspace(1)* %0, i64 %1, i32 %2) #0 {\nentry:\n  %3 = bitcast i8 addrspace(1)* %0 to i64 addrspace(1)*\n  %4 = getelementptr inbounds i64, i64 addrspace(1)* %3, i32 %2\n  store i64 %1, i64 addrspace(1)* %4, align 8, !tbaa !0\n  ret void\n}\n\nattributes #0 = { alwaysinline }\n\n!0 = !{!1, !1, i64 0, i64 0}\n!1 = !{!\"custom_tbaa_addrspace(1)\", !2, i64 0}\n!2 = !{!\"custom_tbaa\"}\n", "entry")::Core.Const(("; ModuleID = 'llvmcall'\nsource_filename = \"llvmcall\"\n\n; Function Attrs: alwaysinline\ndefine void @entry(i8 addrspace(1)* %0, i64 %1, i32 %2) #0 {\nentry:\n  %3 = bitcast i8 addrspace(1)* %0 to i64 addrspace(1)*\n  %4 = getelementptr inbounds i64, i64 addrspace(1)* %3, i32 %2\n  store i64 %1, i64 addrspace(1)* %4, align 8, !tbaa !0\n  ret void\n}\n\nattributes #0 = { alwaysinline }\n\n!0 = !{!1, !1, i64 0, i64 0}\n!1 = !{!\"custom_tbaa_addrspace(1)\", !2, i64 0}\n!2 = !{!\"custom_tbaa\"}\n", "entry"))
  │    %46 = Base.sub_int(%3, 1)::Int32                                                                                                                    │││││┃│││       pointerset
  │          Base.llvmcall(%45, Nothing, Tuple{Core.LLVMPtr{Int64, 1}, Int64, Int32}, %44, %34, %46)::Core.Const(nothing)                                  ││││││┃│         macro expansion
4 │          $(Expr(:foreigncall, "llvm.nvvm.barrier0", Nothing, svec(), 0, :(:llvmcall)))::Core.Const(nothing)                                            │╻          sync_threads
5 └───       return Main.nothing                                                                                                                           │          
3 11 ─       invoke CUDA.throw_boundserror()::Union{}                                                                                                      │╻╷╷╷       getindex
  └───       unreachable                                                                                                                                   ││┃││        #arrayref
  12 ─       invoke CUDA.throw_boundserror()::Union{}                                                                                                      ││╻╷╷        #arrayref
  └───       unreachable                                                                                                                                   │││┃│         checkbounds
  13 ─       invoke CUDA.throw_boundserror()::Union{}                                                                                                      ││╻╷╷        #arrayset
  └───       unreachable                                                                                                                                   │││┃│         checkbounds
Select a call to descend into or ↩ to ascend. [q]uit. [b]ookmark.
Toggles: [w]arn, [h]ide type-stable statements, [t]ype annotations, [s]yntax highlight for Source/LLVM/Native, [j]ump to source always, [o]ptimize, [d]ebuginfo, [r]emarks, [e]ffects, e[x]ception types, [i]nlining costs.
Show: [S]ource code, [A]ST, [T]yped code, [L]LVM IR, [N]ative code
Actions: [E]dit source code, [R]evise and redisplay
Advanced: dump [P]arams cache.
 • %50 = invoke throw_boundserror()::Union{}
   %52 = invoke throw_boundserror()::Union{}
   %54 = invoke throw_boundserror()::Union{}


square_kernel!(x, y) @ Main REPL[18]:1
Variables
  #self#::Core.Const(Main.square_kernel!)
  x::ReactantCUDAExt.CuTracedArray{Int64, 1, 1, (64,)}
  y::ReactantCUDAExt.CuTracedArray{Int64, 1, 1, (64,)}
  i::Int32

∘ ─ %0 = invoke square_kernel!(::ReactantCUDAExt.CuTracedArray{Int64, 1, 1, (64,)},::ReactantCUDAExt.CuTracedArray{Int64, 1, 1, (64,)})::Core.Const(nothing)
    @ REPL[18]:2 within `square_kernel!`
1 ─ %1 = Main.threadIdx()::@NamedTuple{x::Int32, y::Int32, z::Int32}
│        (i = Base.getproperty(%1, :x))::Int32
│   @ REPL[18]:3 within `square_kernel!`
│   %3 = Main.:*::Core.Const(*)
│   %4 = i::Int32
│   %5 = Base.getindex(x, %4)::Int64
│   %6 = i::Int32
│   %7 = Base.getindex(y, %6)::Int64
│   %8 = (%3)(%5, %7)::Int64
│   %9 = i::Int32
│        Base.setindex!(x, %8, %9)::Any
│   @ REPL[18]:4 within `square_kernel!`
│        Main.sync_threads()::Any
│   @ REPL[18]:5 within `square_kernel!`
└──      return Main.nothing
Select a call to descend into or ↩ to ascend. [q]uit. [b]ookmark.
Toggles: [w]arn, [h]ide type-stable statements, [t]ype annotations, [s]yntax highlight for Source/LLVM/Native, [j]ump to source always, [o]ptimize, [d]ebuginfo, [r]emarks, [e]ffects, e[x]ception types, [i]nlining costs.
Show: [S]ource code, [A]ST, [T]yped code, [L]LVM IR, [N]ative code
Actions: [E]dit source code, [R]evise and redisplay
Advanced: dump [P]arams cache.
   %1 = #threadIdx()::@NamedTuple{x::Int32, y::Int32, z::Int32}
   %2 =  = < semi-concrete eval > getproperty(::@NamedTuple{x::Int32, y::Int32, z::Int32},::Core.Const(:x))::Int32
 • %5 = getindex(::ReactantCUDAExt.CuTracedArray{Int64, 1, 1, (64,)},::Int32)::Int64
   %7 = getindex(::ReactantCUDAExt.CuTracedArray{Int64, 1, 1, (64,)},::Int32)::Int64
   %8 = *(::Int64,::Int64)::Int64
   %10 = setindex!(::ReactantCUDAExt.CuTracedArray{Int64, 1, 1, (64,)},::Int64,::Int32)::Any
   %11 = sync_threads()::Any

getindex(A::ReactantCUDAExt.CuTracedArray{T}, i1::Integer) where T @ ReactantCUDAExt ~/git/Reactant.jl/ext/ReactantCUDAExt.jl:151
Variables
  #self#::Core.Const(getindex)
  A::ReactantCUDAExt.CuTracedArray{Int64, 1, 1, (64,)}
  i1::Int32

∘ ─ %0 = invoke getindex(::ReactantCUDAExt.CuTracedArray{Int64, 1, 1, (64,)},::Int32)::Int64
1 ─      nothing::Core.Const(nothing)
│   @ /home/wmoses/git/Reactant.jl/ext/ReactantCUDAExt.jl:151 within `getindex`
│   %2 = ReactantCUDAExt.arrayref(A, i1)::Int64
└──      return %2
Select a call to descend into or ↩ to ascend. [q]uit. [b]ookmark.
Toggles: [w]arn, [h]ide type-stable statements, [t]ype annotations, [s]yntax highlight for Source/LLVM/Native, [j]ump to source always, [o]ptimize, [d]ebuginfo, [r]emarks, [e]ffects, e[x]ception types, [i]nlining costs.
Show: [S]ource code, [A]ST, [T]yped code, [L]LVM IR, [N]ative code
Actions: [E]dit source code, [R]evise and redisplay
Advanced: dump [P]arams cache.
 • %2 = #arrayref(::ReactantCUDAExt.CuTracedArray{Int64, 1, 1, (64,)},::Int32)::Int64

arrayref(A::ReactantCUDAExt.CuTracedArray{T}, index::Integer) where T @ ReactantCUDAExt ~/git/Reactant.jl/ext/ReactantCUDAExt.jl:59
Variables
  #self#::Core.Const(ReactantCUDAExt.arrayref)
  A::ReactantCUDAExt.CuTracedArray{Int64, 1, 1, (64,)}
  index::Int32

∘ ─ %0 = invoke #arrayref(::ReactantCUDAExt.CuTracedArray{Int64, 1, 1, (64,)},::Int32)::Int64
1 ─       nothing::Core.Const(nothing)
│   @ /home/wmoses/git/Reactant.jl/ext/ReactantCUDAExt.jl:62 within `#arrayref`
│   %2  = $(Expr(:boundscheck))::Bool
└──       goto #3 if not %2
2 ─ %4  = ReactantCUDAExt.checkbounds::Core.Const(checkbounds)
└──       (%4)(A, index)::Any
    @ /home/wmoses/git/Reactant.jl/ext/ReactantCUDAExt.jl:63 within `#arrayref`
3 ┄ %6  = Base.isbitsunion::Core.Const(Base.isbitsunion)
│   %7  = $(Expr(:static_parameter, 1))::Core.Const(Int64)
│   %8  = (%6)(%7)::Core.Const(false)
└──       goto #5 if not %8
    @ /home/wmoses/git/Reactant.jl/ext/ReactantCUDAExt.jl:64 within `#arrayref`
4 ─       Core.Const(:(ReactantCUDAExt.arrayref_union))::Union{}
│         Core.Const(:((%10)(A, index)))::Union{}
└──       Core.Const(:(return %11))::Union{}
    @ /home/wmoses/git/Reactant.jl/ext/ReactantCUDAExt.jl:66 within `#arrayref`
5 ┄ %13 = ReactantCUDAExt.arrayref_bits::Core.Const(ReactantCUDAExt.arrayref_bits)
│   %14 = (%13)(A, index)::Int64
└──       return %14
Select a call to descend into or ↩ to ascend. [q]uit. [b]ookmark.
Toggles: [w]arn, [h]ide type-stable statements, [t]ype annotations, [s]yntax highlight for Source/LLVM/Native, [j]ump to source always, [o]ptimize, [d]ebuginfo, [r]emarks, [e]ffects, e[x]ception types, [i]nlining costs.
Show: [S]ource code, [A]ST, [T]yped code, [L]LVM IR, [N]ative code
Actions: [E]dit source code, [R]evise and redisplay
Advanced: dump [P]arams cache.
 • %5 = checkbounds(::ReactantCUDAExt.CuTracedArray{Int64, 1, 1, (64,)},::Int32)::Any
   %8 = isbitsunion(::Type{Int64})::Core.Const(false)
   %14 = arrayref_bits(::ReactantCUDAExt.CuTracedArray{Int64, 1, 1, (64,)},::Int32)::Int64

checkbounds(A::AbstractArray, I...) @ Base abstractarray.jl:697
Variables
  #self#::Core.Const(checkbounds)
  A::ReactantCUDAExt.CuTracedArray{Int64, 1, 1, (64,)}
  I::Tuple{Int32}

∘ ─ %0 = invoke checkbounds(::ReactantCUDAExt.CuTracedArray{Int64, 1, 1, (64,)},::Int32)::Core.Const(nothing)
    @ abstractarray.jl:698 within `checkbounds`
1 ─      nothing::Core.Const(nothing)
│   @ abstractarray.jl:699 within `checkbounds`
│   %2 = Base.checkbounds::Core.Const(checkbounds)
│   %3 = Core.tuple(Base.Bool, A)::Core.PartialStruct(Tuple{DataType, ReactantCUDAExt.CuTracedArray{Int64, 1, 1, (64,)}}, Any[Type{Bool}, ReactantCUDAExt.CuTracedArray{Int64, 1, 1, (64,)}])
│   %4 = Core._apply_iterate(Base.iterate, %2, %3, I)::Bool
└──      goto #3 if not %4
2 ─      goto #4
3 ─      Base.throw_boundserror(A, I)::Union{}
4 ┄      return Base.nothing
Select a call to descend into or ↩ to ascend. [q]uit. [b]ookmark.
Toggles: [w]arn, [h]ide type-stable statements, [t]ype annotations, [s]yntax highlight for Source/LLVM/Native, [j]ump to source always, [o]ptimize, [d]ebuginfo, [r]emarks, [e]ffects, e[x]ception types, [i]nlining costs.
Show: [S]ource code, [A]ST, [T]yped code, [L]LVM IR, [N]ative code
Actions: [E]dit source code, [R]evise and redisplay
Advanced: dump [P]arams cache.
   %4 = checkbounds(::Type{Bool},::ReactantCUDAExt.CuTracedArray{Int64, 1, 1, (64,)},::Int32)::Bool
 • %7 = #throw_boundserror(::ReactantCUDAExt.CuTracedArray{Int64, 1, 1, (64,)},::Tuple{Int32})::Union{}

throw_boundserror(A, I) @ CUDA ~/.julia/packages/CUDA/1kIOw/src/device/quirks.jl:53
Variables
  #self#::Core.Const(Base.throw_boundserror)
  A::ReactantCUDAExt.CuTracedArray{Int64, 1, 1, (64,)}
  I::Tuple{Int32}

∘ ─ %0 = invoke #throw_boundserror(::ReactantCUDAExt.CuTracedArray{Int64, 1, 1, (64,)},::Tuple{Int32})::Union{}
1 ─     nothing::Core.Const(nothing)
│   @ /home/wmoses/.julia/packages/CUDA/1kIOw/src/device/quirks.jl:53 within `#throw_boundserror`
│       CUDA.throw_boundserror()::Union{}
└──     Core.Const(:(return %2))::Union{}
Select a call to descend into or ↩ to ascend. [q]uit. [b]ookmark.
Toggles: [w]arn, [h]ide type-stable statements, [t]ype annotations, [s]yntax highlight for Source/LLVM/Native, [j]ump to source always, [o]ptimize, [d]ebuginfo, [r]emarks, [e]ffects, e[x]ception types, [i]nlining costs.
Show: [S]ource code, [A]ST, [T]yped code, [L]LVM IR, [N]ative code
Actions: [E]dit source code, [R]evise and redisplay
Advanced: dump [P]arams cache.
 • %2 = throw_boundserror()::Union{}

throw_boundserror() @ CUDA ~/.julia/packages/CUDA/1kIOw/src/device/quirks.jl:51
Variables
  #self#::Core.Const(CUDA.throw_boundserror)
  info::Ptr{CUDA.ExceptionInfo_st}

∘ ─ %0 = invoke throw_boundserror()::Union{}
1 ─      nothing::Core.Const(nothing)
│   @ /home/wmoses/.julia/packages/CUDA/1kIOw/src/device/quirks.jl:12 within `throw_boundserror`
│   %2 = CUDA.kernel_state()::CUDA.KernelState
│        (info = Base.getproperty(%2, :exception_info))::Ptr{CUDA.ExceptionInfo_st}
│   @ /home/wmoses/.julia/packages/CUDA/1kIOw/src/device/quirks.jl:13 within `throw_boundserror`
│   %4 = CUDA._strptr(Val{:BoundsError}())::Ptr{UInt8}
│   %5 = info::Ptr{CUDA.ExceptionInfo_st}
│        Base.setproperty!(%5, :subtype, %4)::Any
│   @ /home/wmoses/.julia/packages/CUDA/1kIOw/src/device/quirks.jl:14 within `throw_boundserror`
│   %7 = CUDA._strptr(Val{Symbol("Out-of-bounds array access")}())::Ptr{UInt8}
│   %8 = info::Ptr{CUDA.ExceptionInfo_st}
│        Base.setproperty!(%8, :reason, %7)::Any
│   @ /home/wmoses/.julia/packages/CUDA/1kIOw/src/device/quirks.jl:15 within `throw_boundserror`
│        CUDA.throw(CUDA.nothing)::Union{}
└──      Core.Const(:(return %10))::Union{}
Select a call to descend into or ↩ to ascend. [q]uit. [b]ookmark.
Toggles: [w]arn, [h]ide type-stable statements, [t]ype annotations, [s]yntax highlight for Source/LLVM/Native, [j]ump to source always, [o]ptimize, [d]ebuginfo, [r]emarks, [e]ffects, e[x]ception types, [i]nlining costs.
Show: [S]ource code, [A]ST, [T]yped code, [L]LVM IR, [N]ative code
Actions: [E]dit source code, [R]evise and redisplay
Advanced: dump [P]arams cache.
 • %2 = kernel_state()::CUDA.KernelState
   %3 =  = < semi-concrete eval > getproperty(::CUDA.KernelState,::Core.Const(:exception_info))::Ptr{CUDA.ExceptionInfo_st}
   %4 = _strptr(::Val{:BoundsError})::Ptr{UInt8}
   %6 = setproperty!(::Ptr{CUDA.ExceptionInfo_st},::Symbol,::Ptr{UInt8})::Any
   %7 = _strptr(::Val{Symbol("Out-of-bounds array access")})::Ptr{UInt8}
   %9 = setproperty!(::Ptr{CUDA.ExceptionInfo_st},::Symbol,::Ptr{UInt8})::Any


;  @ /home/wmoses/.julia/packages/CUDA/1kIOw/src/device/quirks.jl:51 within `throw_boundserror`
; Function Attrs: noinline noreturn
define swiftcc void @julia_throw_boundserror_37034(ptr nonnull swiftself %pgcstack_arg) #0 {
top:
  %0 = alloca { i64, i32 }, align 8
  %pgcstack = call ptr @julia.get_pgcstack()
  %current_task = getelementptr inbounds ptr, ptr %pgcstack, i64 -14
  %world_age = getelementptr inbounds i64, ptr %current_task, i64 15
  %current_task1 = getelementptr inbounds ptr, ptr %pgcstack, i64 -14
  %ptls_field = getelementptr inbounds ptr, ptr %current_task1, i64 16
  %ptls_load = load ptr, ptr %ptls_field, align 8
  %1 = getelementptr inbounds ptr, ptr %ptls_load, i64 2
  %safepoint = load ptr, ptr %1, align 8
  fence syncscope("singlethread") seq_cst
  call void @julia.safepoint(ptr %safepoint)
  fence syncscope("singlethread") seq_cst
;  @ /home/wmoses/.julia/packages/CUDA/1kIOw/src/device/quirks.jl:12 within `throw_boundserror`
; ┌ @ none within `kernel_state`
; │┌ @ none within `macro expansion` @ /home/wmoses/.julia/packages/LLVM/b3kFs/src/interop/base.jl:39
    %2 = load ptr, ptr @"*Core.tuple#37036", align 8
    %3 = getelementptr inbounds ptr, ptr %2, i64 0
    %4 = load ptr, ptr @"jl_global#37037", align 8
    %5 = insertvalue [2 x ptr] zeroinitializer, ptr %4, 0
    %6 = load ptr, ptr @"jl_global#37038", align 8
    %7 = insertvalue [2 x ptr] %5, ptr %6, 1
    %8 = load ptr, ptr @"*Core.Intrinsics.llvmcall#37039", align 8
    %9 = getelementptr inbounds ptr, ptr %8, i64 0
    %10 = call { i64, i32 } @julia_throw_boundserror_37034u37040()
    store { i64, i32 } %10, ptr %0, align 8
; └└
  %11 = load ptr, ptr @"*Main.Base.getproperty#37041", align 8
  %12 = getelementptr inbounds ptr, ptr %11, i64 0
  %13 = load ptr, ptr @"jl_global#37042", align 8
  %14 = load ptr, ptr @"+CUDA.KernelState#37043", align 8
  %15 = ptrtoint ptr %14 to i64
  %16 = inttoptr i64 %15 to ptr
  %current_task2 = getelementptr inbounds ptr, ptr %pgcstack, i64 -14
  %17 = call noalias nonnull align 8 dereferenceable(16) ptr @julia.gc_alloc_obj(ptr %current_task2, i64 16, ptr %16) #9
  call void @llvm.memcpy.p0.p0.i64(ptr align 8 %17, ptr align 8 %0, i64 16, i1 false)
  %18 = load ptr, ptr @"jl_sym#exception_info#37044", align 8
  %19 = call nonnull ptr (ptr, ptr, ...) @julia.call(ptr @ijl_apply_generic, ptr %13, ptr %17, ptr %18)
;  @ /home/wmoses/.julia/packages/CUDA/1kIOw/src/device/quirks.jl:13 within `throw_boundserror`
  %20 = load ptr, ptr @"*CUDA._strptr#37045", align 8
  %21 = getelementptr inbounds ptr, ptr %20, i64 0
  %22 = load ptr, ptr @"jl_global#37046", align 8
  %23 = load ptr, ptr @"jl_global#37047", align 8
  %24 = call nonnull ptr (ptr, ptr, ...) @julia.call(ptr @ijl_apply_generic, ptr %22, ptr %23)
  %25 = load ptr, ptr @"*Main.Base.setproperty!#37048", align 8
  %26 = getelementptr inbounds ptr, ptr %25, i64 0
  %27 = load ptr, ptr @"jl_global#37049", align 8
  %28 = load ptr, ptr @"jl_sym#subtype#37050", align 8
  %29 = call nonnull ptr (ptr, ptr, ...) @julia.call(ptr @ijl_apply_generic, ptr %27, ptr %19, ptr %28, ptr %24)
;  @ /home/wmoses/.julia/packages/CUDA/1kIOw/src/device/quirks.jl:14 within `throw_boundserror`
  %30 = load ptr, ptr @"*CUDA._strptr#37045", align 8
  %31 = getelementptr inbounds ptr, ptr %30, i64 0
  %32 = load ptr, ptr @"jl_global#37046", align 8
  %33 = load ptr, ptr @"jl_global#37051", align 8
  %34 = call nonnull ptr (ptr, ptr, ...) @julia.call(ptr @ijl_apply_generic, ptr %32, ptr %33)
  %35 = load ptr, ptr @"*Main.Base.setproperty!#37048", align 8
  %36 = getelementptr inbounds ptr, ptr %35, i64 0
  %37 = load ptr, ptr @"jl_global#37049", align 8
  %38 = load ptr, ptr @"jl_sym#reason#37052", align 8
  %39 = call nonnull ptr (ptr, ptr, ...) @julia.call(ptr @ijl_apply_generic, ptr %37, ptr %19, ptr %38, ptr %34)
;  @ /home/wmoses/.julia/packages/CUDA/1kIOw/src/device/quirks.jl:15 within `throw_boundserror`
  %40 = load ptr, ptr @"*Core.throw#37053", align 8
  %41 = getelementptr inbounds ptr, ptr %40, i64 0
  %42 = load ptr, ptr @"*Core.nothing#37054", align 8
  %43 = getelementptr inbounds ptr, ptr %42, i64 0
  %44 = load ptr, ptr @jl_nothing, align 8
  call void @ijl_throw(ptr %44)
  unreachable

after_throw:                                      ; No predecessors!
  call void @llvm.trap()
  unreachable

after_noret:                                      ; No predecessors!
  call void @llvm.trap()
  unreachable
}

Select a call to descend into or ↩ to ascend. [q]uit. [b]ookmark.
Toggles: [w]arn, [h]ide type-stable statements, [t]ype annotations, [s]yntax highlight for Source/LLVM/Native, [j]ump to source always, [o]ptimize, [d]ebuginfo, [r]emarks, [e]ffects, e[x]ception types, [i]nlining costs.
Show: [S]ource code, [A]ST, [T]yped code, [L]LVM IR, [N]ative code
Actions: [E]dit source code, [R]evise and redisplay
Advanced: dump [P]arams cache.
 • %2 = kernel_state()::CUDA.KernelState
   %3 =  = < semi-concrete eval > getproperty(::CUDA.KernelState,::Core.Const(:exception_info))::Ptr{CUDA.ExceptionInfo_st}
   %4 = _strptr(::Val{:BoundsError})::Ptr{UInt8}
   %6 = setproperty!(::Ptr{CUDA.ExceptionInfo_st},::Symbol,::Ptr{UInt8})::Any
   %7 = _strptr(::Val{Symbol("Out-of-bounds array access")})::Ptr{UInt8}
   %9 = setproperty!(::Ptr{CUDA.ExceptionInfo_st},::Symbol,::Ptr{UInt8})::Any

"""

@wsmoses
Copy link
Member

wsmoses commented Jan 26, 2025

Changing it to @inline resolves as in here: JuliaGPU/CUDA.jl#2633

@giordano
Copy link
Member Author

Perhaps I'm wrong, but my understanding was that the no-inline was to force users to resolve bounds checks at compile time, either by writing code which would let the compiler check them automatically or by adding @inbounds annotations if sure the code will always be used in a safe way, but putting this macro in front of the indexing expressions doesn't seem to help.

@wsmoses
Copy link
Member

wsmoses commented Jan 26, 2025

I mean if that's the case then the code here is itself invalid (since it doesn't contain an inbonuds), and only caught on 1.11. CUDA.jl, however, supports such errors where it will even print the error message so that feels wrong

@vchuravy
Copy link
Member

What is the device_code dir=all?

@vchuravy
Copy link
Member

The core issue seems to be:

  %22 = load ptr, ptr @"jl_global#37046", align 8
  %23 = load ptr, ptr @"jl_global#37047", align 8
  %24 = call nonnull ptr (ptr, ptr, ...) @julia.call(ptr @ijl_apply_generic, ptr %22, ptr %23)

What are those globals?

@vchuravy
Copy link
Member

But note Cthulhu is likely lying to you for LLVM IR JuliaDebug/Cthulhu.jl#510

@wsmoses
Copy link
Member

wsmoses commented Jan 27, 2025

cc @gbaraldi

@giordano
Copy link
Member Author

giordano commented Feb 1, 2025

I can confirm JuliaLang/julia#57224 fixes this for me (I tested it in the backport PR: JuliaLang/julia#57183). We can close this after v1.11.4 is released.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants