-
Notifications
You must be signed in to change notification settings - Fork 20
InvalidIRError in CUDA integration tests on non-CUDA machines #614
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@vchuravy @maleadt I'm not sure why but I think CUDA.jl on 1.11 is hitting an error (in its error handling). In particular Julia is deciding to emit a type unstable invoke to CUDA.throw_boundserror() [no arguments], instead of inlining it and enabling the subsequent LLVM code to be converted to an exit asm instruction. # Using Julia 1.11
using CUDA, Reactant, Test, Revise, GPUCompiler, Cthulhu
using Reactant
using Test
using CUDA
function square_kernel!(x, y)
i = threadIdx().x
x[i] *= y[i]
sync_threads()
return nothing
end
# basic squaring on GPU
function square!(x, y)
@cuda blocks = 1 threads = length(x) square_kernel!(x, y)
return nothing
end
oA = collect(1:1:64);
A = Reactant.to_rarray(oA);
B = Reactant.to_rarray(100 .* oA);
@jit square!(A, B)
a = err
"""
1-element ExceptionStack:
InvalidIRError: compiling MethodInstance for square_kernel!(::ReactantCUDAExt.CuTracedArray{Int64, 1, 1, (64,)}, ::ReactantCUDAExt.CuTracedArray{Int64, 1, 1, (64,)}) resulted in invalid LLVM IR
Reason: unsupported dynamic function invocation (call to throw_boundserror() @ CUDA ~/.julia/packages/CUDA/1kIOw/src/device/quirks.jl:51)
Stacktrace:
[1] #throw_boundserror
@ ~/.julia/packages/CUDA/1kIOw/src/device/quirks.jl:53
[2] checkbounds
@ ./abstractarray.jl:699
[3] #arrayref
@ ~/git/Reactant.jl/ext/ReactantCUDAExt.jl:62
[4] getindex
@ ~/git/Reactant.jl/ext/ReactantCUDAExt.jl:151
[5] square_kernel!
@ ./REPL[5]:3
Hint: catch this exception as `err` and call `code_typed(err; interactive = true)` to introspect the erronous code with Cthulhu.jl
Stacktrace:
[1] (::ReactantCUDAExt.var"#7#10"{CompilerJob{PTXCompilerTarget, CUDA.CUDACompilerParams}})(ctx::LLVM.Context)
@ ReactantCUDAExt ~/git/Reactant.jl/ext/ReactantCUDAExt.jl:402
[2] JuliaContext(f::ReactantCUDAExt.var"#7#10"{CompilerJob{PTXCompilerTarget, CUDA.CUDACompilerParams}}; kwargs::@Kwargs{})
@ GPUCompiler ~/.julia/packages/GPUCompiler/Nxf8r/src/driver.jl:34
[3] JuliaContext
@ ~/.julia/packages/GPUCompiler/Nxf8r/src/driver.jl:25 [inlined]
[4] compile(job::CompilerJob{PTXCompilerTarget, CUDA.CUDACompilerParams})
@ ReactantCUDAExt ~/git/Reactant.jl/ext/ReactantCUDAExt.jl:338
[5] actual_compilation(cache::Dict{Any, ReactantCUDAExt.LLVMFunc}, src::Core.MethodInstance, world::UInt64, cfg::CompilerConfig{PTXCompilerTarget, CUDA.CUDACompilerParams}, compiler::typeof(ReactantCUDAExt.compile), linker::typeof(ReactantCUDAExt.link))
@ GPUCompiler ~/.julia/packages/GPUCompiler/Nxf8r/src/execution.jl:237
[6] cached_compilation(cache::Dict{Any, ReactantCUDAExt.LLVMFunc}, src::Core.MethodInstance, cfg::CompilerConfig{PTXCompilerTarget, CUDA.CUDACompilerParams}, compiler::Function, linker::Function)
@ GPUCompiler ~/.julia/packages/GPUCompiler/Nxf8r/src/execution.jl:151
[7] macro expansion
@ ~/git/Reactant.jl/ext/ReactantCUDAExt.jl:753 [inlined]
[8] macro expansion
@ ./lock.jl:273 [inlined]
[9] cufunction(f::typeof(square_kernel!), tt::Type{Tuple{ReactantCUDAExt.CuTracedArray{Int64, 1, 1, (64,)}, ReactantCUDAExt.CuTracedArray{Int64, 1, 1, (64,)}}}; kwargs::@Kwargs{})
@ ReactantCUDAExt ~/git/Reactant.jl/ext/ReactantCUDAExt.jl:731
[10] #cufunction
@ ~/git/Reactant.jl/ext/ReactantCUDAExt.jl:728 [inlined]
[11] cufunction(none::typeof(square_kernel!), none::Type{Tuple{ReactantCUDAExt.CuTracedArray{Int64, 1, 1, (64,)}, ReactantCUDAExt.CuTracedArray{Int64, 1, 1, (64,)}}})
@ Reactant ./<missing>:0
[12] #cufunction
@ ~/git/Reactant.jl/ext/ReactantCUDAExt.jl:728 [inlined]
[13] call_with_reactant(::typeof(cufunction), ::typeof(square_kernel!), ::Type{Tuple{ReactantCUDAExt.CuTracedArray{Int64, 1, 1, (64,)}, ReactantCUDAExt.CuTracedArray{Int64, 1, 1, (64,)}}})
@ Reactant ~/git/Reactant.jl/src/utils.jl:0
[14] macro expansion
@ ~/.julia/packages/CUDA/1kIOw/src/compiler/execution.jl:112 [inlined]
[15] square!
@ ./REPL[6]:2 [inlined]
[16] square!(none::Reactant.TracedRArray{Int64, 1}, none::Reactant.TracedRArray{Int64, 1})
@ Reactant ./<missing>:0
[17] macro expansion
@ ~/.julia/packages/CUDA/1kIOw/src/compiler/execution.jl:108 [inlined]
[18] square!
@ ./REPL[6]:2 [inlined]
[19] call_with_reactant(::typeof(square!), ::Reactant.TracedRArray{Int64, 1}, ::Reactant.TracedRArray{Int64, 1})
@ Reactant ~/git/Reactant.jl/src/utils.jl:0
[20] make_mlir_fn(f::Function, args::Tuple{ConcreteRArray{Int64, 1}, ConcreteRArray{Int64, 1}}, kwargs::Tuple{}, name::String, concretein::Bool; toscalar::Bool, return_dialect::Symbol, do_transpose::Bool, no_args_in_result::Bool)
@ Reactant.TracedUtils ~/git/Reactant.jl/src/TracedUtils.jl:216
[21] make_mlir_fn
@ ~/git/Reactant.jl/src/TracedUtils.jl:129 [inlined]
[22] compile_mlir!(mod::Reactant.MLIR.IR.Module, f::Function, args::Tuple{ConcreteRArray{Int64, 1}, ConcreteRArray{Int64, 1}}; optimize::Bool, no_nan::Bool)
@ Reactant.Compiler ~/git/Reactant.jl/src/Compiler.jl:441
[23] compile_mlir!
@ ~/git/Reactant.jl/src/Compiler.jl:432 [inlined]
[24] compile_xla(f::Function, args::Tuple{ConcreteRArray{Int64, 1}, ConcreteRArray{Int64, 1}}; client::Nothing, optimize::Bool, no_nan::Bool, device::Nothing)
@ Reactant.Compiler ~/git/Reactant.jl/src/Compiler.jl:982
[25] compile_xla
@ ~/git/Reactant.jl/src/Compiler.jl:972 [inlined]
[26] compile(f::Function, args::Tuple{ConcreteRArray{Int64, 1}, ConcreteRArray{Int64, 1}}; sync::Bool, kwargs::@Kwargs{client::Nothing, no_nan::Bool, device::Nothing, optimize::Bool})
@ Reactant.Compiler ~/git/Reactant.jl/src/Compiler.jl:1047
[27] top-level scope
@ ~/git/Reactant.jl/src/Compiler.jl:695
"""
code_typed(a.stack[1].exception, interactive=true)
"""
julia> code_typed(a.stack[1].exception, interactive=true)
square_kernel!(x, y) @ Main REPL[18]:1
∘ ── %0 = invoke square_kernel!(::ReactantCUDAExt.CuTracedArray{Int64, 1, 1, (64,)},::ReactantCUDAExt.CuTracedArray{Int64, 1, 1, (64,)})::Core.Const(nothing)
2 1 ── %1 = Core.tuple("; ModuleID = 'llvmcall'\nsource_filename = \"llvmcall\"\n\n; Function Attrs: alwaysinline\ndefine i32 @entry() #0 {\nentry:\n %0 = call i32 @llvm.nvvm.read.ptx.sreg.tid.x(), !range !0\n ret i32 %0\n}\n\n; Function Attrs: nocallback nofree nosync nounwind speculatable willreturn memory(none)\ndeclare i32 @llvm.nvvm.read.ptx.sreg.tid.x() #1\n\nattributes #0 = { alwaysinline }\nattributes #1 = { nocallback nofree nosync nounwind speculatable willreturn memory(none) }\n\n!0 = !{i32 0, i32 1023}\n", "entry")::Core.Const(("; ModuleID = 'llvmcall'\nsource_filename = \"llvmcall\"\n\n; Function Attrs: alwaysinline\ndefine i32 @entry() #0 {\nentry:\n %0 = call i32 @llvm.nvvm.read.ptx.sreg.tid.x(), !range !0\n ret i32 %0\n}\n\n; Function Attrs: nocallback nofree nosync nounwind speculatable willreturn memory(none)\ndeclare i32 @llvm.nvvm.read.ptx.sreg.tid.x() #1\n\nattributes #0 = { alwaysinline }\nattributes #1 = { nocallback nofree nosync nounwind speculatable willreturn memory(none) }\n\n!0 = !{i32 0, i32 1023}\n", "entry"))
│ %2 = Base.llvmcall(%1, Int32, Tuple{})::Int32 ││┃│││ threadIdx_x
│ %3 = Base.add_int(%2, 1)::Int32 │││╻ +
│ %4 = Core.tuple("; ModuleID = 'llvmcall'\nsource_filename = \"llvmcall\"\n\n; Function Attrs: alwaysinline\ndefine i32 @entry() #0 {\nentry:\n %0 = call i32 @llvm.nvvm.read.ptx.sreg.tid.y(), !range !0\n ret i32 %0\n}\n\n; Function Attrs: nocallback nofree nosync nounwind speculatable willreturn memory(none)\ndeclare i32 @llvm.nvvm.read.ptx.sreg.tid.y() #1\n\nattributes #0 = { alwaysinline }\nattributes #1 = { nocallback nofree nosync nounwind speculatable willreturn memory(none) }\n\n!0 = !{i32 0, i32 1023}\n", "entry")::Core.Const(("; ModuleID = 'llvmcall'\nsource_filename = \"llvmcall\"\n\n; Function Attrs: alwaysinline\ndefine i32 @entry() #0 {\nentry:\n %0 = call i32 @llvm.nvvm.read.ptx.sreg.tid.y(), !range !0\n ret i32 %0\n}\n\n; Function Attrs: nocallback nofree nosync nounwind speculatable willreturn memory(none)\ndeclare i32 @llvm.nvvm.read.ptx.sreg.tid.y() #1\n\nattributes #0 = { alwaysinline }\nattributes #1 = { nocallback nofree nosync nounwind speculatable willreturn memory(none) }\n\n!0 = !{i32 0, i32 1023}\n", "entry"))
│ Base.llvmcall(%4, Int32, Tuple{})::Int32 ││││┃│ macro expansion
│ %6 = Core.tuple("; ModuleID = 'llvmcall'\nsource_filename = \"llvmcall\"\n\n; Function Attrs: alwaysinline\ndefine i32 @entry() #0 {\nentry:\n %0 = call i32 @llvm.nvvm.read.ptx.sreg.tid.z(), !range !0\n ret i32 %0\n}\n\n; Function Attrs: nocallback nofree nosync nounwind speculatable willreturn memory(none)\ndeclare i32 @llvm.nvvm.read.ptx.sreg.tid.z() #1\n\nattributes #0 = { alwaysinline }\nattributes #1 = { nocallback nofree nosync nounwind speculatable willreturn memory(none) }\n\n!0 = !{i32 0, i32 63}\n", "entry")::Core.Const(("; ModuleID = 'llvmcall'\nsource_filename = \"llvmcall\"\n\n; Function Attrs: alwaysinline\ndefine i32 @entry() #0 {\nentry:\n %0 = call i32 @llvm.nvvm.read.ptx.sreg.tid.z(), !range !0\n ret i32 %0\n}\n\n; Function Attrs: nocallback nofree nosync nounwind speculatable willreturn memory(none)\ndeclare i32 @llvm.nvvm.read.ptx.sreg.tid.z() #1\n\nattributes #0 = { alwaysinline }\nattributes #1 = { nocallback nofree nosync nounwind speculatable willreturn memory(none) }\n\n!0 = !{i32 0, i32 63}\n", "entry"))
│ Base.llvmcall(%6, Int32, Tuple{})::Int32 ││││┃│ macro expansion
3 │ %8 = $(Expr(:boundscheck, true))::Bool │╻╷ getindex
└─── goto #4 if not %8 ││┃ #arrayref
2 ── %10 = Core.sext_int(Core.Int64, %3)::Int64 │││╻╷╷╷╷╷╷╷ checkbounds
│ %11 = Base.sle_int(1, %10)::Bool ││││╻ checkbounds
│ %12 = Core.sext_int(Core.Int64, %3)::Int64 │││││╻╷╷╷╷ checkindex
│ %13 = Base.sle_int(%12, 64)::Bool ││││││╻ <=
│ %14 = Base.and_int(%11, %13)::Bool ││││││╻ &
└─── goto #11 if not %14 ││││
3 ── nothing::Nothing │
4 ┄─ %17 = Base.getfield(x, :ptr)::Core.LLVMPtr{Int64, 1} │││╻╷╷╷ arrayref_bits
│ %18 = Core.tuple("; ModuleID = 'llvmcall'\nsource_filename = \"llvmcall\"\n\n; Function Attrs: alwaysinline\ndefine i64 @entry(i8 addrspace(1)* %0, i32 %1) #0 {\nentry:\n %2 = bitcast i8 addrspace(1)* %0 to i64 addrspace(1)*\n %3 = getelementptr inbounds i64, i64 addrspace(1)* %2, i32 %1\n %4 = load i64, i64 addrspace(1)* %3, align 8, !tbaa !0\n ret i64 %4\n}\n\nattributes #0 = { alwaysinline }\n\n!0 = !{!1, !1, i64 0, i64 0}\n!1 = !{!\"custom_tbaa_addrspace(1)\", !2, i64 0}\n!2 = !{!\"custom_tbaa\"}\n", "entry")::Core.Const(("; ModuleID = 'llvmcall'\nsource_filename = \"llvmcall\"\n\n; Function Attrs: alwaysinline\ndefine i64 @entry(i8 addrspace(1)* %0, i32 %1) #0 {\nentry:\n %2 = bitcast i8 addrspace(1)* %0 to i64 addrspace(1)*\n %3 = getelementptr inbounds i64, i64 addrspace(1)* %2, i32 %1\n %4 = load i64, i64 addrspace(1)* %3, align 8, !tbaa !0\n ret i64 %4\n}\n\nattributes #0 = { alwaysinline }\n\n!0 = !{!1, !1, i64 0, i64 0}\n!1 = !{!\"custom_tbaa_addrspace(1)\", !2, i64 0}\n!2 = !{!\"custom_tbaa\"}\n", "entry"))
│ %19 = Base.sub_int(%3, 1)::Int32 │││││┃│││ pointerref
│ %20 = Base.llvmcall(%18, Int64, Tuple{Core.LLVMPtr{Int64, 1}, Int32}, %17, %19)::Int64 ││││││┃│ macro expansion
│ %21 = $(Expr(:boundscheck, true))::Bool ││╻ #arrayref
└─── goto #7 if not %21 │││
5 ── %23 = Core.sext_int(Core.Int64, %3)::Int64 │││╻╷╷╷╷╷╷╷ checkbounds
│ %24 = Base.sle_int(1, %23)::Bool ││││╻ checkbounds
│ %25 = Core.sext_int(Core.Int64, %3)::Int64 │││││╻╷╷╷╷ checkindex
│ %26 = Base.sle_int(%25, 64)::Bool ││││││╻ <=
│ %27 = Base.and_int(%24, %26)::Bool ││││││╻ &
└─── goto #12 if not %27 ││││
6 ── nothing::Nothing │
7 ┄─ %30 = Base.getfield(y, :ptr)::Core.LLVMPtr{Int64, 1} │││╻╷╷╷ arrayref_bits
│ %31 = Core.tuple("; ModuleID = 'llvmcall'\nsource_filename = \"llvmcall\"\n\n; Function Attrs: alwaysinline\ndefine i64 @entry(i8 addrspace(1)* %0, i32 %1) #0 {\nentry:\n %2 = bitcast i8 addrspace(1)* %0 to i64 addrspace(1)*\n %3 = getelementptr inbounds i64, i64 addrspace(1)* %2, i32 %1\n %4 = load i64, i64 addrspace(1)* %3, align 8, !tbaa !0\n ret i64 %4\n}\n\nattributes #0 = { alwaysinline }\n\n!0 = !{!1, !1, i64 0, i64 0}\n!1 = !{!\"custom_tbaa_addrspace(1)\", !2, i64 0}\n!2 = !{!\"custom_tbaa\"}\n", "entry")::Core.Const(("; ModuleID = 'llvmcall'\nsource_filename = \"llvmcall\"\n\n; Function Attrs: alwaysinline\ndefine i64 @entry(i8 addrspace(1)* %0, i32 %1) #0 {\nentry:\n %2 = bitcast i8 addrspace(1)* %0 to i64 addrspace(1)*\n %3 = getelementptr inbounds i64, i64 addrspace(1)* %2, i32 %1\n %4 = load i64, i64 addrspace(1)* %3, align 8, !tbaa !0\n ret i64 %4\n}\n\nattributes #0 = { alwaysinline }\n\n!0 = !{!1, !1, i64 0, i64 0}\n!1 = !{!\"custom_tbaa_addrspace(1)\", !2, i64 0}\n!2 = !{!\"custom_tbaa\"}\n", "entry"))
│ %32 = Base.sub_int(%3, 1)::Int32 │││││┃│││ pointerref
│ %33 = Base.llvmcall(%31, Int64, Tuple{Core.LLVMPtr{Int64, 1}, Int32}, %30, %32)::Int64 ││││││┃│ macro expansion
│ %34 = Base.mul_int(%20, %33)::Int64 │╻ *
│ %35 = $(Expr(:boundscheck, true))::Bool ││╻ #arrayset
└─── goto #10 if not %35 │││
8 ── %37 = Core.sext_int(Core.Int64, %3)::Int64 │││╻╷╷╷╷╷╷╷ checkbounds
│ %38 = Base.sle_int(1, %37)::Bool ││││╻ checkbounds
│ %39 = Core.sext_int(Core.Int64, %3)::Int64 │││││╻╷╷╷╷ checkindex
│ %40 = Base.sle_int(%39, 64)::Bool ││││││╻ <=
│ %41 = Base.and_int(%38, %40)::Bool ││││││╻ &
└─── goto #13 if not %41 ││││
9 ── nothing::Nothing │
10 ┄ %44 = Base.getfield(x, :ptr)::Core.LLVMPtr{Int64, 1} │││╻╷╷╷ arrayset_bits
│ %45 = Core.tuple("; ModuleID = 'llvmcall'\nsource_filename = \"llvmcall\"\n\n; Function Attrs: alwaysinline\ndefine void @entry(i8 addrspace(1)* %0, i64 %1, i32 %2) #0 {\nentry:\n %3 = bitcast i8 addrspace(1)* %0 to i64 addrspace(1)*\n %4 = getelementptr inbounds i64, i64 addrspace(1)* %3, i32 %2\n store i64 %1, i64 addrspace(1)* %4, align 8, !tbaa !0\n ret void\n}\n\nattributes #0 = { alwaysinline }\n\n!0 = !{!1, !1, i64 0, i64 0}\n!1 = !{!\"custom_tbaa_addrspace(1)\", !2, i64 0}\n!2 = !{!\"custom_tbaa\"}\n", "entry")::Core.Const(("; ModuleID = 'llvmcall'\nsource_filename = \"llvmcall\"\n\n; Function Attrs: alwaysinline\ndefine void @entry(i8 addrspace(1)* %0, i64 %1, i32 %2) #0 {\nentry:\n %3 = bitcast i8 addrspace(1)* %0 to i64 addrspace(1)*\n %4 = getelementptr inbounds i64, i64 addrspace(1)* %3, i32 %2\n store i64 %1, i64 addrspace(1)* %4, align 8, !tbaa !0\n ret void\n}\n\nattributes #0 = { alwaysinline }\n\n!0 = !{!1, !1, i64 0, i64 0}\n!1 = !{!\"custom_tbaa_addrspace(1)\", !2, i64 0}\n!2 = !{!\"custom_tbaa\"}\n", "entry"))
│ %46 = Base.sub_int(%3, 1)::Int32 │││││┃│││ pointerset
│ Base.llvmcall(%45, Nothing, Tuple{Core.LLVMPtr{Int64, 1}, Int64, Int32}, %44, %34, %46)::Core.Const(nothing) ││││││┃│ macro expansion
4 │ $(Expr(:foreigncall, "llvm.nvvm.barrier0", Nothing, svec(), 0, :(:llvmcall)))::Core.Const(nothing) │╻ sync_threads
5 └─── return Main.nothing │
3 11 ─ invoke CUDA.throw_boundserror()::Union{} │╻╷╷╷ getindex
└─── unreachable ││┃││ #arrayref
12 ─ invoke CUDA.throw_boundserror()::Union{} ││╻╷╷ #arrayref
└─── unreachable │││┃│ checkbounds
13 ─ invoke CUDA.throw_boundserror()::Union{} ││╻╷╷ #arrayset
└─── unreachable │││┃│ checkbounds
Select a call to descend into or ↩ to ascend. [q]uit. [b]ookmark.
Toggles: [w]arn, [h]ide type-stable statements, [t]ype annotations, [s]yntax highlight for Source/LLVM/Native, [j]ump to source always, [o]ptimize, [d]ebuginfo, [r]emarks, [e]ffects, e[x]ception types, [i]nlining costs.
Show: [S]ource code, [A]ST, [T]yped code, [L]LVM IR, [N]ative code
Actions: [E]dit source code, [R]evise and redisplay
Advanced: dump [P]arams cache.
• %50 = invoke throw_boundserror()::Union{}
%52 = invoke throw_boundserror()::Union{}
%54 = invoke throw_boundserror()::Union{}
↩
square_kernel!(x, y) @ Main REPL[18]:1
Variables
#self#::Core.Const(Main.square_kernel!)
x::ReactantCUDAExt.CuTracedArray{Int64, 1, 1, (64,)}
y::ReactantCUDAExt.CuTracedArray{Int64, 1, 1, (64,)}
i::Int32
∘ ─ %0 = invoke square_kernel!(::ReactantCUDAExt.CuTracedArray{Int64, 1, 1, (64,)},::ReactantCUDAExt.CuTracedArray{Int64, 1, 1, (64,)})::Core.Const(nothing)
@ REPL[18]:2 within `square_kernel!`
1 ─ %1 = Main.threadIdx()::@NamedTuple{x::Int32, y::Int32, z::Int32}
│ (i = Base.getproperty(%1, :x))::Int32
│ @ REPL[18]:3 within `square_kernel!`
│ %3 = Main.:*::Core.Const(*)
│ %4 = i::Int32
│ %5 = Base.getindex(x, %4)::Int64
│ %6 = i::Int32
│ %7 = Base.getindex(y, %6)::Int64
│ %8 = (%3)(%5, %7)::Int64
│ %9 = i::Int32
│ Base.setindex!(x, %8, %9)::Any
│ @ REPL[18]:4 within `square_kernel!`
│ Main.sync_threads()::Any
│ @ REPL[18]:5 within `square_kernel!`
└── return Main.nothing
Select a call to descend into or ↩ to ascend. [q]uit. [b]ookmark.
Toggles: [w]arn, [h]ide type-stable statements, [t]ype annotations, [s]yntax highlight for Source/LLVM/Native, [j]ump to source always, [o]ptimize, [d]ebuginfo, [r]emarks, [e]ffects, e[x]ception types, [i]nlining costs.
Show: [S]ource code, [A]ST, [T]yped code, [L]LVM IR, [N]ative code
Actions: [E]dit source code, [R]evise and redisplay
Advanced: dump [P]arams cache.
%1 = #threadIdx()::@NamedTuple{x::Int32, y::Int32, z::Int32}
%2 = = < semi-concrete eval > getproperty(::@NamedTuple{x::Int32, y::Int32, z::Int32},::Core.Const(:x))::Int32
• %5 = getindex(::ReactantCUDAExt.CuTracedArray{Int64, 1, 1, (64,)},::Int32)::Int64
%7 = getindex(::ReactantCUDAExt.CuTracedArray{Int64, 1, 1, (64,)},::Int32)::Int64
%8 = *(::Int64,::Int64)::Int64
%10 = setindex!(::ReactantCUDAExt.CuTracedArray{Int64, 1, 1, (64,)},::Int64,::Int32)::Any
%11 = sync_threads()::Any
↩
getindex(A::ReactantCUDAExt.CuTracedArray{T}, i1::Integer) where T @ ReactantCUDAExt ~/git/Reactant.jl/ext/ReactantCUDAExt.jl:151
Variables
#self#::Core.Const(getindex)
A::ReactantCUDAExt.CuTracedArray{Int64, 1, 1, (64,)}
i1::Int32
∘ ─ %0 = invoke getindex(::ReactantCUDAExt.CuTracedArray{Int64, 1, 1, (64,)},::Int32)::Int64
1 ─ nothing::Core.Const(nothing)
│ @ /home/wmoses/git/Reactant.jl/ext/ReactantCUDAExt.jl:151 within `getindex`
│ %2 = ReactantCUDAExt.arrayref(A, i1)::Int64
└── return %2
Select a call to descend into or ↩ to ascend. [q]uit. [b]ookmark.
Toggles: [w]arn, [h]ide type-stable statements, [t]ype annotations, [s]yntax highlight for Source/LLVM/Native, [j]ump to source always, [o]ptimize, [d]ebuginfo, [r]emarks, [e]ffects, e[x]ception types, [i]nlining costs.
Show: [S]ource code, [A]ST, [T]yped code, [L]LVM IR, [N]ative code
Actions: [E]dit source code, [R]evise and redisplay
Advanced: dump [P]arams cache.
• %2 = #arrayref(::ReactantCUDAExt.CuTracedArray{Int64, 1, 1, (64,)},::Int32)::Int64
↩
arrayref(A::ReactantCUDAExt.CuTracedArray{T}, index::Integer) where T @ ReactantCUDAExt ~/git/Reactant.jl/ext/ReactantCUDAExt.jl:59
Variables
#self#::Core.Const(ReactantCUDAExt.arrayref)
A::ReactantCUDAExt.CuTracedArray{Int64, 1, 1, (64,)}
index::Int32
∘ ─ %0 = invoke #arrayref(::ReactantCUDAExt.CuTracedArray{Int64, 1, 1, (64,)},::Int32)::Int64
1 ─ nothing::Core.Const(nothing)
│ @ /home/wmoses/git/Reactant.jl/ext/ReactantCUDAExt.jl:62 within `#arrayref`
│ %2 = $(Expr(:boundscheck))::Bool
└── goto #3 if not %2
2 ─ %4 = ReactantCUDAExt.checkbounds::Core.Const(checkbounds)
└── (%4)(A, index)::Any
@ /home/wmoses/git/Reactant.jl/ext/ReactantCUDAExt.jl:63 within `#arrayref`
3 ┄ %6 = Base.isbitsunion::Core.Const(Base.isbitsunion)
│ %7 = $(Expr(:static_parameter, 1))::Core.Const(Int64)
│ %8 = (%6)(%7)::Core.Const(false)
└── goto #5 if not %8
@ /home/wmoses/git/Reactant.jl/ext/ReactantCUDAExt.jl:64 within `#arrayref`
4 ─ Core.Const(:(ReactantCUDAExt.arrayref_union))::Union{}
│ Core.Const(:((%10)(A, index)))::Union{}
└── Core.Const(:(return %11))::Union{}
@ /home/wmoses/git/Reactant.jl/ext/ReactantCUDAExt.jl:66 within `#arrayref`
5 ┄ %13 = ReactantCUDAExt.arrayref_bits::Core.Const(ReactantCUDAExt.arrayref_bits)
│ %14 = (%13)(A, index)::Int64
└── return %14
Select a call to descend into or ↩ to ascend. [q]uit. [b]ookmark.
Toggles: [w]arn, [h]ide type-stable statements, [t]ype annotations, [s]yntax highlight for Source/LLVM/Native, [j]ump to source always, [o]ptimize, [d]ebuginfo, [r]emarks, [e]ffects, e[x]ception types, [i]nlining costs.
Show: [S]ource code, [A]ST, [T]yped code, [L]LVM IR, [N]ative code
Actions: [E]dit source code, [R]evise and redisplay
Advanced: dump [P]arams cache.
• %5 = checkbounds(::ReactantCUDAExt.CuTracedArray{Int64, 1, 1, (64,)},::Int32)::Any
%8 = isbitsunion(::Type{Int64})::Core.Const(false)
%14 = arrayref_bits(::ReactantCUDAExt.CuTracedArray{Int64, 1, 1, (64,)},::Int32)::Int64
↩
checkbounds(A::AbstractArray, I...) @ Base abstractarray.jl:697
Variables
#self#::Core.Const(checkbounds)
A::ReactantCUDAExt.CuTracedArray{Int64, 1, 1, (64,)}
I::Tuple{Int32}
∘ ─ %0 = invoke checkbounds(::ReactantCUDAExt.CuTracedArray{Int64, 1, 1, (64,)},::Int32)::Core.Const(nothing)
@ abstractarray.jl:698 within `checkbounds`
1 ─ nothing::Core.Const(nothing)
│ @ abstractarray.jl:699 within `checkbounds`
│ %2 = Base.checkbounds::Core.Const(checkbounds)
│ %3 = Core.tuple(Base.Bool, A)::Core.PartialStruct(Tuple{DataType, ReactantCUDAExt.CuTracedArray{Int64, 1, 1, (64,)}}, Any[Type{Bool}, ReactantCUDAExt.CuTracedArray{Int64, 1, 1, (64,)}])
│ %4 = Core._apply_iterate(Base.iterate, %2, %3, I)::Bool
└── goto #3 if not %4
2 ─ goto #4
3 ─ Base.throw_boundserror(A, I)::Union{}
4 ┄ return Base.nothing
Select a call to descend into or ↩ to ascend. [q]uit. [b]ookmark.
Toggles: [w]arn, [h]ide type-stable statements, [t]ype annotations, [s]yntax highlight for Source/LLVM/Native, [j]ump to source always, [o]ptimize, [d]ebuginfo, [r]emarks, [e]ffects, e[x]ception types, [i]nlining costs.
Show: [S]ource code, [A]ST, [T]yped code, [L]LVM IR, [N]ative code
Actions: [E]dit source code, [R]evise and redisplay
Advanced: dump [P]arams cache.
%4 = checkbounds(::Type{Bool},::ReactantCUDAExt.CuTracedArray{Int64, 1, 1, (64,)},::Int32)::Bool
• %7 = #throw_boundserror(::ReactantCUDAExt.CuTracedArray{Int64, 1, 1, (64,)},::Tuple{Int32})::Union{}
↩
throw_boundserror(A, I) @ CUDA ~/.julia/packages/CUDA/1kIOw/src/device/quirks.jl:53
Variables
#self#::Core.Const(Base.throw_boundserror)
A::ReactantCUDAExt.CuTracedArray{Int64, 1, 1, (64,)}
I::Tuple{Int32}
∘ ─ %0 = invoke #throw_boundserror(::ReactantCUDAExt.CuTracedArray{Int64, 1, 1, (64,)},::Tuple{Int32})::Union{}
1 ─ nothing::Core.Const(nothing)
│ @ /home/wmoses/.julia/packages/CUDA/1kIOw/src/device/quirks.jl:53 within `#throw_boundserror`
│ CUDA.throw_boundserror()::Union{}
└── Core.Const(:(return %2))::Union{}
Select a call to descend into or ↩ to ascend. [q]uit. [b]ookmark.
Toggles: [w]arn, [h]ide type-stable statements, [t]ype annotations, [s]yntax highlight for Source/LLVM/Native, [j]ump to source always, [o]ptimize, [d]ebuginfo, [r]emarks, [e]ffects, e[x]ception types, [i]nlining costs.
Show: [S]ource code, [A]ST, [T]yped code, [L]LVM IR, [N]ative code
Actions: [E]dit source code, [R]evise and redisplay
Advanced: dump [P]arams cache.
• %2 = throw_boundserror()::Union{}
↩
throw_boundserror() @ CUDA ~/.julia/packages/CUDA/1kIOw/src/device/quirks.jl:51
Variables
#self#::Core.Const(CUDA.throw_boundserror)
info::Ptr{CUDA.ExceptionInfo_st}
∘ ─ %0 = invoke throw_boundserror()::Union{}
1 ─ nothing::Core.Const(nothing)
│ @ /home/wmoses/.julia/packages/CUDA/1kIOw/src/device/quirks.jl:12 within `throw_boundserror`
│ %2 = CUDA.kernel_state()::CUDA.KernelState
│ (info = Base.getproperty(%2, :exception_info))::Ptr{CUDA.ExceptionInfo_st}
│ @ /home/wmoses/.julia/packages/CUDA/1kIOw/src/device/quirks.jl:13 within `throw_boundserror`
│ %4 = CUDA._strptr(Val{:BoundsError}())::Ptr{UInt8}
│ %5 = info::Ptr{CUDA.ExceptionInfo_st}
│ Base.setproperty!(%5, :subtype, %4)::Any
│ @ /home/wmoses/.julia/packages/CUDA/1kIOw/src/device/quirks.jl:14 within `throw_boundserror`
│ %7 = CUDA._strptr(Val{Symbol("Out-of-bounds array access")}())::Ptr{UInt8}
│ %8 = info::Ptr{CUDA.ExceptionInfo_st}
│ Base.setproperty!(%8, :reason, %7)::Any
│ @ /home/wmoses/.julia/packages/CUDA/1kIOw/src/device/quirks.jl:15 within `throw_boundserror`
│ CUDA.throw(CUDA.nothing)::Union{}
└── Core.Const(:(return %10))::Union{}
Select a call to descend into or ↩ to ascend. [q]uit. [b]ookmark.
Toggles: [w]arn, [h]ide type-stable statements, [t]ype annotations, [s]yntax highlight for Source/LLVM/Native, [j]ump to source always, [o]ptimize, [d]ebuginfo, [r]emarks, [e]ffects, e[x]ception types, [i]nlining costs.
Show: [S]ource code, [A]ST, [T]yped code, [L]LVM IR, [N]ative code
Actions: [E]dit source code, [R]evise and redisplay
Advanced: dump [P]arams cache.
• %2 = kernel_state()::CUDA.KernelState
%3 = = < semi-concrete eval > getproperty(::CUDA.KernelState,::Core.Const(:exception_info))::Ptr{CUDA.ExceptionInfo_st}
%4 = _strptr(::Val{:BoundsError})::Ptr{UInt8}
%6 = setproperty!(::Ptr{CUDA.ExceptionInfo_st},::Symbol,::Ptr{UInt8})::Any
%7 = _strptr(::Val{Symbol("Out-of-bounds array access")})::Ptr{UInt8}
%9 = setproperty!(::Ptr{CUDA.ExceptionInfo_st},::Symbol,::Ptr{UInt8})::Any
↩
; @ /home/wmoses/.julia/packages/CUDA/1kIOw/src/device/quirks.jl:51 within `throw_boundserror`
; Function Attrs: noinline noreturn
define swiftcc void @julia_throw_boundserror_37034(ptr nonnull swiftself %pgcstack_arg) #0 {
top:
%0 = alloca { i64, i32 }, align 8
%pgcstack = call ptr @julia.get_pgcstack()
%current_task = getelementptr inbounds ptr, ptr %pgcstack, i64 -14
%world_age = getelementptr inbounds i64, ptr %current_task, i64 15
%current_task1 = getelementptr inbounds ptr, ptr %pgcstack, i64 -14
%ptls_field = getelementptr inbounds ptr, ptr %current_task1, i64 16
%ptls_load = load ptr, ptr %ptls_field, align 8
%1 = getelementptr inbounds ptr, ptr %ptls_load, i64 2
%safepoint = load ptr, ptr %1, align 8
fence syncscope("singlethread") seq_cst
call void @julia.safepoint(ptr %safepoint)
fence syncscope("singlethread") seq_cst
; @ /home/wmoses/.julia/packages/CUDA/1kIOw/src/device/quirks.jl:12 within `throw_boundserror`
; ┌ @ none within `kernel_state`
; │┌ @ none within `macro expansion` @ /home/wmoses/.julia/packages/LLVM/b3kFs/src/interop/base.jl:39
%2 = load ptr, ptr @"*Core.tuple#37036", align 8
%3 = getelementptr inbounds ptr, ptr %2, i64 0
%4 = load ptr, ptr @"jl_global#37037", align 8
%5 = insertvalue [2 x ptr] zeroinitializer, ptr %4, 0
%6 = load ptr, ptr @"jl_global#37038", align 8
%7 = insertvalue [2 x ptr] %5, ptr %6, 1
%8 = load ptr, ptr @"*Core.Intrinsics.llvmcall#37039", align 8
%9 = getelementptr inbounds ptr, ptr %8, i64 0
%10 = call { i64, i32 } @julia_throw_boundserror_37034u37040()
store { i64, i32 } %10, ptr %0, align 8
; └└
%11 = load ptr, ptr @"*Main.Base.getproperty#37041", align 8
%12 = getelementptr inbounds ptr, ptr %11, i64 0
%13 = load ptr, ptr @"jl_global#37042", align 8
%14 = load ptr, ptr @"+CUDA.KernelState#37043", align 8
%15 = ptrtoint ptr %14 to i64
%16 = inttoptr i64 %15 to ptr
%current_task2 = getelementptr inbounds ptr, ptr %pgcstack, i64 -14
%17 = call noalias nonnull align 8 dereferenceable(16) ptr @julia.gc_alloc_obj(ptr %current_task2, i64 16, ptr %16) #9
call void @llvm.memcpy.p0.p0.i64(ptr align 8 %17, ptr align 8 %0, i64 16, i1 false)
%18 = load ptr, ptr @"jl_sym#exception_info#37044", align 8
%19 = call nonnull ptr (ptr, ptr, ...) @julia.call(ptr @ijl_apply_generic, ptr %13, ptr %17, ptr %18)
; @ /home/wmoses/.julia/packages/CUDA/1kIOw/src/device/quirks.jl:13 within `throw_boundserror`
%20 = load ptr, ptr @"*CUDA._strptr#37045", align 8
%21 = getelementptr inbounds ptr, ptr %20, i64 0
%22 = load ptr, ptr @"jl_global#37046", align 8
%23 = load ptr, ptr @"jl_global#37047", align 8
%24 = call nonnull ptr (ptr, ptr, ...) @julia.call(ptr @ijl_apply_generic, ptr %22, ptr %23)
%25 = load ptr, ptr @"*Main.Base.setproperty!#37048", align 8
%26 = getelementptr inbounds ptr, ptr %25, i64 0
%27 = load ptr, ptr @"jl_global#37049", align 8
%28 = load ptr, ptr @"jl_sym#subtype#37050", align 8
%29 = call nonnull ptr (ptr, ptr, ...) @julia.call(ptr @ijl_apply_generic, ptr %27, ptr %19, ptr %28, ptr %24)
; @ /home/wmoses/.julia/packages/CUDA/1kIOw/src/device/quirks.jl:14 within `throw_boundserror`
%30 = load ptr, ptr @"*CUDA._strptr#37045", align 8
%31 = getelementptr inbounds ptr, ptr %30, i64 0
%32 = load ptr, ptr @"jl_global#37046", align 8
%33 = load ptr, ptr @"jl_global#37051", align 8
%34 = call nonnull ptr (ptr, ptr, ...) @julia.call(ptr @ijl_apply_generic, ptr %32, ptr %33)
%35 = load ptr, ptr @"*Main.Base.setproperty!#37048", align 8
%36 = getelementptr inbounds ptr, ptr %35, i64 0
%37 = load ptr, ptr @"jl_global#37049", align 8
%38 = load ptr, ptr @"jl_sym#reason#37052", align 8
%39 = call nonnull ptr (ptr, ptr, ...) @julia.call(ptr @ijl_apply_generic, ptr %37, ptr %19, ptr %38, ptr %34)
; @ /home/wmoses/.julia/packages/CUDA/1kIOw/src/device/quirks.jl:15 within `throw_boundserror`
%40 = load ptr, ptr @"*Core.throw#37053", align 8
%41 = getelementptr inbounds ptr, ptr %40, i64 0
%42 = load ptr, ptr @"*Core.nothing#37054", align 8
%43 = getelementptr inbounds ptr, ptr %42, i64 0
%44 = load ptr, ptr @jl_nothing, align 8
call void @ijl_throw(ptr %44)
unreachable
after_throw: ; No predecessors!
call void @llvm.trap()
unreachable
after_noret: ; No predecessors!
call void @llvm.trap()
unreachable
}
Select a call to descend into or ↩ to ascend. [q]uit. [b]ookmark.
Toggles: [w]arn, [h]ide type-stable statements, [t]ype annotations, [s]yntax highlight for Source/LLVM/Native, [j]ump to source always, [o]ptimize, [d]ebuginfo, [r]emarks, [e]ffects, e[x]ception types, [i]nlining costs.
Show: [S]ource code, [A]ST, [T]yped code, [L]LVM IR, [N]ative code
Actions: [E]dit source code, [R]evise and redisplay
Advanced: dump [P]arams cache.
• %2 = kernel_state()::CUDA.KernelState
%3 = = < semi-concrete eval > getproperty(::CUDA.KernelState,::Core.Const(:exception_info))::Ptr{CUDA.ExceptionInfo_st}
%4 = _strptr(::Val{:BoundsError})::Ptr{UInt8}
%6 = setproperty!(::Ptr{CUDA.ExceptionInfo_st},::Symbol,::Ptr{UInt8})::Any
%7 = _strptr(::Val{Symbol("Out-of-bounds array access")})::Ptr{UInt8}
%9 = setproperty!(::Ptr{CUDA.ExceptionInfo_st},::Symbol,::Ptr{UInt8})::Any
↩
""" |
Changing it to @inline resolves as in here: JuliaGPU/CUDA.jl#2633 |
Perhaps I'm wrong, but my understanding was that the no-inline was to force users to resolve bounds checks at compile time, either by writing code which would let the compiler check them automatically or by adding |
I mean if that's the case then the code here is itself invalid (since it doesn't contain an inbonuds), and only caught on 1.11. CUDA.jl, however, supports such errors where it will even print the error message so that feels wrong |
What is the |
The core issue seems to be:
What are those globals? |
But note Cthulhu is likely lying to you for LLVM IR JuliaDebug/Cthulhu.jl#510 |
cc @gbaraldi |
I can confirm JuliaLang/julia#57224 fixes this for me (I tested it in the backport PR: JuliaLang/julia#57183). We can close this after v1.11.4 is released. |
After the upgrade to Julia v1.11.3 we're getting
See e.g. Julia 1.11 - integration - ubuntu-20.04 - x64 - packaged libReactant - assertions=false - push
I'm going to skip these tests to keep CI greener, but this issue still needs to be resolved.
The text was updated successfully, but these errors were encountered: