Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

inbounds propagation mostly missing in base/genericmemory.jl? #56145

Open
nsajko opened this issue Oct 13, 2024 · 8 comments
Open

inbounds propagation mostly missing in base/genericmemory.jl? #56145

nsajko opened this issue Oct 13, 2024 · 8 comments
Labels
arrays [a, r, r, a, y, s] performance Must go faster

Comments

@nsajko
Copy link
Contributor

nsajko commented Oct 13, 2024

I'm not sure if this is intentional, and I'd have to remind myself of the details of how inbounds propagation works to understand this better, but it seems that, for example, setindex!(::Memory, ::Any, ::Int) isn't propagating inbounds. Discovered by @giordano in JuliaArrays/FixedSizeArrays.jl#65.

Perhaps it's just necessary to add @_propagate_inbounds_meta to some methods in genericmemory.jl 🤷.

@nsajko nsajko added arrays [a, r, r, a, y, s] performance Must go faster labels Oct 13, 2024
@giordano
Copy link
Contributor

giordano commented Oct 13, 2024

I wonder if this is the cause of the lack of vectorisation of the stores reported in JuliaArrays/FixedSizeArrays.jl#68 (comment): I'm struggling to see what's substantially different in the setindex! methods for Arrays and FixedSizeArrays, but missing @_propagate_inbounds_meta in genericmemory code would explain that. For example, comparing

julia> code_llvm((Vector{Float64},)) do v
           for idx in eachindex(v)
               v[idx] = 1.0
           end
       end
[...]
L13.preheader17:                                  ; preds = %top
  %memoryref_data = load ptr, ptr %"v::Array", align 8
;  @ REPL[22]:3 within `#8`
; ┌ @ array.jl:989 within `setindex!`
; │┌ @ array.jl:993 within `_setindex!`
    %invariant.gep = getelementptr i8, ptr %memoryref_data, i64 -8
    %min.iters.check = icmp ult i64 %"v::Array.size.0.copyload", 8
    br i1 %min.iters.check, label %scalar.ph, label %vector.ph
[...]

vs

julia> code_llvm((FixedSizeVector{Float64,Memory{Float64}},)) do v
           for idx in eachindex(v)
               v[idx] = 1.0
           end
       end
[...]
L13.preheader15:                                  ; preds = %top
  %memoryref_mem = load ptr, ptr %.roots.v, align 8
  %memory_data_ptr = getelementptr inbounds { i64, ptr }, ptr %memoryref_mem, i64 0, i32 1
;  @ REPL[11]:3 within `#26`
; ┌ @ /Users/mose/.julia/packages/FixedSizeArrays/rX5Xp/src/FixedSizeArrays.jl:60 within `setindex!` @ genericmemory.jl:240
; │┌ @ boot.jl:564 within `memoryref`
    %memoryref_data.pre = load ptr, ptr %memory_data_ptr, align 8
; │└
; │ @ /Users/mose/.julia/packages/FixedSizeArrays/rX5Xp/src/FixedSizeArrays.jl:59 within `setindex!`
; │┌ @ abstractarray.jl:699 within `checkbounds`
    br label %L30
[...]

Maybe

julia/base/boot.jl

Lines 564 to 566 in 67c93b9

memoryref(mem::GenericMemory) = memoryrefnew(mem)
memoryref(mem::GenericMemory, i::Integer) = memoryrefnew(memoryrefnew(mem), Int(i), @_boundscheck)
memoryref(ref::GenericMemoryRef, i::Integer) = memoryrefnew(ref, Int(i), @_boundscheck)
should propagate the inbounds?

Edit: on a closer inspection I'm not sure what I wrote is related to the issue reported above, I opened a different ticket to continue the discussion about that: JuliaArrays/FixedSizeArrays.jl#70

@vchuravy
Copy link
Member

I have been fixing some of these in #56151 and #55902

@vchuravy
Copy link
Member

On #56151

julia> code_llvm((FixedSizeVector{Float64,Memory{Float64}},), debuginfo=:none) do v
                 for idx in eachindex(v)
                     v[idx] = 1.0
                 end
             end
; Function Signature: var"#8"(FixedSizeArrays.FixedSizeArray{Float64, 1, Memory{Float64}})
define void @"julia_#8_3933"(ptr nocapture noundef nonnull readonly align 8 dereferenceable(16) %"v::FixedSizeArray", ptr nocapture readonly %.roots.v) #0 {
top:
  %0 = getelementptr inbounds i8, ptr %"v::FixedSizeArray", i64 8
  %.unbox = load i64, ptr %0, align 8
  %1 = icmp slt i64 %.unbox, 1
  br i1 %1, label %L48, label %L13.preheader15

L13.preheader15:                                  ; preds = %top
  %memoryref_mem = load ptr, ptr %.roots.v, align 8
  %memory_data_ptr = getelementptr inbounds { i64, ptr }, ptr %memoryref_mem, i64 0, i32 1
  %memoryref_data.pre = load ptr, ptr %memory_data_ptr, align 8
  br label %L30

L30:                                              ; preds = %L30, %L13.preheader15
  %value_phi3 = phi i64 [ %3, %L30 ], [ 1, %L13.preheader15 ]
  %memoryref_offset = shl i64 %value_phi3, 3
  %2 = getelementptr i8, ptr %memoryref_data.pre, i64 %memoryref_offset
  %memoryref_data6 = getelementptr i8, ptr %2, i64 -8
  store i64 4607182418800017408, ptr %memoryref_data6, align 8
  %3 = add nuw i64 %value_phi3, 1
  %4 = icmp ult i64 %value_phi3, %.unbox
  br i1 %4, label %L30, label %L48

L48:                                              ; preds = %L30, %top
  ret void
}

@vchuravy
Copy link
Member

vchuravy commented Oct 14, 2024

With JULIA_LLVM_ARGS="--print-before=loop-vectorize --pass-remarks-analysis=loop-vectorize"

; *** IR Dump Before LoopVectorizePass on julia_#17_2310 ***
define void @"julia_#17_2310"(ptr addrspace(11) nocapture noundef nonnull readonly align 8 dereferenceable(16) %"v::FixedSizeArray", ptr nocapture readonly %.roots.v) #0 !dbg !5 {
top:
  %pgcstack = call ptr @julia.get_pgcstack()
  %memoryref_mem = load ptr addrspace(10), ptr %.roots.v, align 8, !tbaa !24, !alias.scope !28, !noalias !31
  call void @llvm.dbg.declare(metadata ptr addrspace(11) %"v::FixedSizeArray", metadata !23, metadata !DIExpression()), !dbg !36
  %0 = getelementptr inbounds i8, ptr addrspace(11) %"v::FixedSizeArray", i64 8, !dbg !37
  %.unbox = load i64, ptr addrspace(11) %0, align 8, !dbg !51, !tbaa !62, !invariant.load !10, !alias.scope !64, !noalias !65
  %1 = icmp slt i64 %.unbox, 1, !dbg !51
  br i1 %1, label %L48, label %L13.preheader15, !dbg !36

L13.preheader15:                                  ; preds = %top
  %2 = addrspacecast ptr addrspace(10) %memoryref_mem to ptr addrspace(11)
  %memory_data_ptr = getelementptr inbounds { i64, ptr }, ptr addrspace(11) %2, i64 0, i32 1
  br label %L30, !dbg !66

L30:                                              ; preds = %L30, %L13.preheader15
  %value_phi3 = phi i64 [ %5, %L30 ], [ 1, %L13.preheader15 ]
  %memoryref_data = load ptr, ptr addrspace(11) %memory_data_ptr, align 8, !dbg !71, !tbaa !62, !invariant.load !10, !alias.scope !64, !noalias !65, !nonnull !10
  %memoryref_offset = shl i64 %value_phi3, 3, !dbg !74
  %3 = call ptr addrspace(13) @julia.gc_loaded(ptr addrspace(10) %memoryref_mem, ptr %memoryref_data), !dbg !78
  %4 = getelementptr i8, ptr addrspace(13) %3, i64 %memoryref_offset, !dbg !78
  %memoryref_data6 = getelementptr i8, ptr addrspace(13) %4, i64 -8, !dbg !78
  store i64 4607182418800017408, ptr addrspace(13) %memoryref_data6, align 8, !dbg !78, !tbaa !79, !alias.scope !81, !noalias !82
  %5 = add nuw i64 %value_phi3, 1, !dbg !83
  %6 = icmp ult i64 %value_phi3, %.unbox, !dbg !84
  br i1 %6, label %L30, label %L48.loopexit, !dbg !84

L48.loopexit:                                     ; preds = %L30
  br label %L48, !dbg !84

L48:                                              ; preds = %L48.loopexit, %top
  ret void, !dbg !84
}
remark: genericmemory.jl:242:0: loop not vectorized: call instruction cannot be vectorized

@giordano
Copy link
Contributor

I'm currently away from keyboard, can you please try with https://github.com/JuliaArrays/FixedSizeArrays.jl#f62e4012536acc6aea9e282fedcb3dd6fd09f5a9? I'd hope that by reverting JuliaArrays/FixedSizeArrays.jl#68 we now get automatic bounds checking elision and vectorised stores. You should also be able to check the same by running your example above on a Memory{Float64} instead of a FixedSizeArray

@vchuravy
Copy link
Member

So the core of the snippet above btw is that gc_loaded and co didn't get hoisted into the pre-header.

On: https://github.com/JuliaArrays/FixedSizeArrays.jl#f62e4012536acc6aea9e282fedcb3dd6fd09f5a9

I see vectorization, but it seems to rely on IRCE? So there iss a OOB branch.

@giordano
Copy link
Contributor

On: https://github.com/JuliaArrays/FixedSizeArrays.jl#f62e4012536acc6aea9e282fedcb3dd6fd09f5a9

I see vectorization, but it seems to rely on IRCE? So there iss a OOB branch.

To be clear, on FixedSizeArrays.jl#f62e4012536acc6aea9e282fedcb3dd6fd09f5a9, which is before JuliaArrays/FixedSizeArrays.jl#68, we're just using setindex!(::Memory), so if we can cleanly remove boundschecking on Memory when safe to do it while retaining vectorised stores, we'd automatically fix all issues in FixedSizeArrays (we'd need to revert JuliaArrays/FixedSizeArrays.jl#68 to get vectorised stores back, but I'd be happy to have everything fixed upstream, I don't care much about the specific solution in that PR)

@vchuravy
Copy link
Member

So the issue is that we need to move some stuff out of the loop, and it seems we were able to do so on the old version, but on the new version LICM failed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrays [a, r, r, a, y, s] performance Must go faster
Projects
None yet
Development

No branches or pull requests

3 participants