-
Notifications
You must be signed in to change notification settings - Fork 234
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kernel using StaticArray
compiles in julia v1.9.4 but not in v1.10.2
#2313
Comments
MArray is an unreliable type that makes your code depends on the optimizations of the LLVM-level allocopt pass. You should be using an array type that doesn't rely on mutable semantics that are then optimized away. That said, with this specific example it looks like Base codegen for |
Thanks for taking a look! Do you have a suggested array type? This example is a reduction from a Kernel Abstractions kernel https://github.com/HorribleSanity/Raven.jl/blob/971310fdb4a691281b33b0a6114f8cd2c4d1f01f/src/lobattocells.jl#L2441-L2583 That said, I have found a mutable array of registers quite useful to minimize code duplication, e.g, Lines 96 to 223 in 7f725c0
Are you saying there is not a recommended way of achieving this? |
Because we don't have a good way to express a mutable, on-stack array type in Julia right now. IIUC, ImmutableArrays were supposed to solve this, but that effort stalled. A safer way is to use an immutable array type and In the mean time, using MArray isn't bad per se, you just have to be aware of its fragility, meaning that minor changes to Julia's codegen can result in LLVM becoming confused and your previously stack-allocated MArray now becoming heap allocated (and as a result GPU incompatible). |
I think the core issue here is that the |
That is too bad. That is what I really want.
Thanks for the references. They seem reasonable for the uses I have of |
I think this is the issue as well. Two questions: |
Yeah globally turning on bounds checking with break Yeah as long as |
This quirk allows bounds checking of `MArray`s on device in more cases. This is a workaround to address <JuliaGPU#2313>.
This pr #2314 fixes my current issue. I understand that is it probably not the most desirable solution but wonder if it is something worth adding because of KA's current reliance on |
This quirk allows bounds checking of `MArray`s on device in more cases. This is a workaround to address <JuliaGPU#2313>.
This quirk allows bounds checking of `MArray`s on device in more cases. This is a workaround to address <JuliaGPU/CUDA.jl#2313>. This change was proposed upstream here <JuliaGPU/CUDA.jl#2314> and is no longer needed if accepted upstream.
I'm not opposed to such a quirk if it's required, but I'm surprised it's necessary in the first place as the CPU compilation pipeline does properly optimize everything away already. |
With Last time I checked this would also happen on the CPU pipeline and is a side-effect of our BoundsError design. |
Sure, for that case the quirk is probably still useful. However, without that the MWE from this thread does only reproduce using GPUCompiler.jl, so might be worth looking into. |
I haven't found the difference yet, but I suspect it's in the optimization pipeline. Pre-optimizations:1.10
1.9
Post-opt1.10
1.9
In both cases dead-arg elimination (which runs very late) removed the actual argument to the CUDA.jl specific In both cases what remains is a |
Okay JuliaGPU/GPUCompiler.jl@cf6bcab fixes this case. Still not ideal since we have a Note the PR is against current GPUCompiler, but locally I had backported it to 0.25 |
This avoid capturing `MArray`s in `BoundsError` to allow bounds checking of `MArray`s on device in more cases. This is a workaround to address <JuliaGPU#2313>.
This avoid capturing `MArray`s in `BoundsError` to allow bounds checking of `MArray`s on device in more cases. This is a workaround to address <JuliaGPU#2313>.
Fixed by JuliaGPU/GPUCompiler.jl#559 |
This quirk allows bounds checking of `MArray`s on device in more cases. This is a workaround to address <JuliaGPU/CUDA.jl#2313>. This change was proposed upstream here <JuliaGPU/CUDA.jl#2314> and is no longer needed if accepted upstream.
Describe the bug
The kernel
compiles and runs with julia v1.9.4 but fails to compile with julia v1.10.2 with the error:
To reproduce
The Minimal Working Example (MWE) for this bug:
Manifest.toml
Julia v1.9.4 Manifest.toml
Expected behavior
I expect the code to compile on both versions of julia.
Version info
Details on Julia:
Details on CUDA:
Additional context
Both versions of julia give the same
@device_code_warntype
.The text was updated successfully, but these errors were encountered: