Simple gpu loop on CUDA does not return #137

pagnani · 2022-02-14T14:37:41Z

On julia 1.7.2 creating a new environment with only the included packages (see below)

using Tullio, CUDA, LoopVectorization, CUDAKernels, KernelAbstractions
function gpr(N, L)
    Jseq = rand(Float32, N + 2, N + 2, L, L, 2, 2) |> cu
    conditional = rand(Float32, N + 2, N + 2, L, L, 2, 2) |> cu
    @tullio g[nl, nl1, l, xl, xl1] := conditional[ni, nl, i, l, xi, xl] * Jseq[ni, nj, i, j, xi, xj] * conditional[nj, nl1, j, l+1, xj, xl1] * (i <= l) * (j > l) * (j > i + 1)
    
    return g
end
julia> N=5; L=3; gpr(N,L)

never returns (and GPU usage 100%)

Pkg status status

  [052768ef] CUDA v3.8.0
  [72cfdca4] CUDAKernels v0.3.3
  [63c18a36] KernelAbstractions v0.7.2
  [bdcacae8] LoopVectorization v0.12.101
  [bc48ee85] Tullio v0.3.3

CuDevice(0): TITAN RTX
CUDA 11.0.0

Thanks a lot!

The text was updated successfully, but these errors were encountered:

mcabbott · 2022-02-19T17:33:43Z

Thanks for the report. I can reproduce this, but have no idea what causes it.

It works on the CPU, with threads=false (to use KA) and verbose=true (to know):

julia> N=5; L=3; gpr(N,L)
┌ Info: left index ranges
│   nl = Base.OneTo(7)
│   nl1 = Base.OneTo(7)
│   l = 1:2
│   xl = Base.OneTo(2)
└   xl1 = Base.OneTo(2)
┌ Info: reduction index ranges
│   ni = Base.OneTo(7)
│   i = Base.OneTo(3)
│   xi = Base.OneTo(2)
│   nj = Base.OneTo(7)
│   j = Base.OneTo(3)
└   xj = Base.OneTo(2)
[ Info: running KernelAbstractions CPU actor 
7×7×2×2×2 Array{Float32, 5}:
[:, :, 1, 1, 1] =
 19.8817  22.2586  23.2881  20.2121  19.9547  22.5193  20.0603
 ...

On the GPU, it still seems to hang if I comment out * (i <= l) * (j > l) * (j > i + 1).

I wonder if this is just too many loops for KA to handle, or hits some e.g. factorial optimisation step? 11 nested loops is quite deep, and it may be that nobody tested that many. If so, the next step is probably to run it with verbose=2 which will print out the kernel being used, from which we can try to reproduce this without Tullio.

mcabbott added the GPU label Apr 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simple gpu loop on CUDA does not return #137

Simple gpu loop on CUDA does not return #137

pagnani commented Feb 14, 2022

mcabbott commented Feb 19, 2022 •

edited

Loading

Simple gpu loop on CUDA does not return #137

Simple gpu loop on CUDA does not return #137

Comments

pagnani commented Feb 14, 2022

mcabbott commented Feb 19, 2022 • edited Loading

mcabbott commented Feb 19, 2022 •

edited

Loading