Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

propagate inbounds in a few more places for memoryref #56151

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

vchuravy
Copy link
Member

memoryref(A, ind) does an internal boundscheck and if we want @inbounds to ellide all boundscheck
we must use @propagate_inbounds.

Follow-up to #55902

@giordano
Copy link
Contributor

Thanks for looking into this! However on this branch with

function f!(v)
    for idx in eachindex(v)
        v[idx] = 1.0
    end
end
code_llvm(f!, (Memory{Float64},))

I still see boundschecking branches, which aren't there with

code_llvm(f!, (Vector{Float64},))

However Vector uses MemoryRef instead of Memory, so that's going through a completely different code path, but feels like Memory should be able to behave similarly?

@vchuravy
Copy link
Member Author

So one thing I don't understand is why memoryrefset! emits it's own boundscheck when memoryref and memoryrefnew alreay check accesses.

@vchuravy
Copy link
Member Author

vchuravy commented Oct 14, 2024

First note: The issue @giordano is seeing is not due to @_propagate_inbounds. Both codes emit a boundscheck uncondtionally. But in one case LLVM can fully eliminate it, IRCE can move it out of the loop, but it can't be eliminated fully.

So Memory does a fair bit more complicated check than Vector...

For array:

L17:                                              ; preds = %L13
  %6 = sub i64 %value_phi3, 1
  %"v::Array.size_ptr5" = getelementptr inbounds i8, ptr %"v::Array", i32 16
  call void @llvm.memcpy.p0.p0.i64(ptr align 8 %"v::Array.size6", ptr align 8 %"v::Array.size_ptr5", i64 8, i1 false)
  %bitcast = load i64, ptr %"v::Array.size6", align 8
  %7 = icmp ult i64 %6, %bitcast
  %8 = xor i1 %7, true
  br i1 %8, label %L26, label %L25

L25:                                              ; preds = %L17
  br label %L29

L26:                                              ; preds = %L17
  %9 = getelementptr inbounds i8, ptr %"new::Tuple", i32 0
  store i64 %value_phi3, ptr %9, align 8
  call void @j_throw_boundserror_9803(ptr %"v::Array", ptr nocapture readonly %"new::Tuple") #5
  call void @ijl_error(ptr @"_j_str_(Internal Error - IR Vali...#2")
  unreachable

For Memory

; ┌ @ /home/vchuravy/src/julia/base/genericmemory.jl:240 within `setindex!`
; │┌ @ boot.jl:545 within `memoryref`
    %memory_data_ptr = getelementptr inbounds { i64, ptr }, ptr %"v::GenericMemory", i32 0, i32 1
    %memoryref_data = load ptr, ptr %memory_data_ptr, align 8
    %6 = insertvalue { ptr, ptr } zeroinitializer, ptr %memoryref_data, 0
    %memory_ref = insertvalue { ptr, ptr } %6, ptr %"v::GenericMemory", 1
; │└
   %memoryref_offset = sub i64 %value_phi3, 1
   %7 = getelementptr inbounds { i64, ptr }, ptr %"v::GenericMemory", i32 0, i32 0
   %memory_len = load i64, ptr %7, align 8
   %8 = add nuw i64 %memory_len, %memory_len
   %9 = add i64 %memoryref_offset, %memory_len
   %memoryref_ovflw = icmp uge i64 %9, %8
   %memoryref_byteoffset = mul i64 %memoryref_offset, 8
   %memoryref_data_byteoffset = getelementptr i8, ptr %memoryref_data, i64 %memoryref_byteoffset
   %10 = getelementptr inbounds { i64, ptr }, ptr %"v::GenericMemory", i32 0, i32 0
   %memory_len5 = load i64, ptr %10, align 8
   %memory_data_ptr6 = getelementptr inbounds { i64, ptr }, ptr %"v::GenericMemory", i32 0, i32 1
   %memory_data = load ptr, ptr %memory_data_ptr6, align 8
   %11 = ptrtoint ptr %memory_data to i64
   %12 = ptrtoint ptr %memoryref_data_byteoffset to i64
   %13 = sub i64 %12, %11
   %memoryref_bytelen = mul nuw nsw i64 %memory_len5, 8
   %memoryref_isinbounds = icmp ult i64 %13, %memoryref_bytelen
   %14 = xor i1 %memoryref_ovflw, true
   %"memoryref_isinbounds&notovflw" = and i1 %14, %memoryref_isinbounds
   br i1 %"memoryref_isinbounds&notovflw", label %idxend, label %oob

My hypothesis is that likely somethhing with the overflow and byte length is tripping LLVM up:

L11:                                              ; preds = %idxend, %scalar.ph
   %value_phi3 = phi i64 [ %17, %idxend ], [ %bc.resume.val, %scalar.ph ]
   %memoryref_offset = shl i64 %value_phi3, 3
   %memoryref_byteoffset = add i64 %memoryref_offset, -8
   %memoryref_isinbounds = icmp ult i64 %memoryref_byteoffset, %memoryref_bytelen
   br i1 %memoryref_isinbounds, label %idxend, label %oob

...

L11.postloop:                                     ; preds = %idxend.postloop, %postloop
  %value_phi3.postloop = phi i64 [ %20, %idxend.postloop ], [ %value_phi3.copy, %postloop ]
;  @ REPL[4]:3 within `#5`
; ┌ @ /home/vchuravy/src/julia/base/genericmemory.jl:240 within `setindex!`
   %memoryref_offset.postloop = add nsw i64 %value_phi3.postloop, -1
   %19 = add nuw nsw i64 %.unbox, %memoryref_offset.postloop
   %memoryref_ovflw.not.postloop = icmp ult i64 %19, %1
   %memoryref_byteoffset.postloop = shl i64 %memoryref_offset.postloop, 3
   %memoryref_isinbounds.postloop = icmp ult i64 %memoryref_byteoffset.postloop, %memoryref_bytelen
   %"memoryref_isinbounds&notovflw.postloop" = and i1 %memoryref_ovflw.not.postloop, %memoryref_isinbounds.postloop
   br i1 %"memoryref_isinbounds&notovflw.postloop", label %idxend.postloop, label %oob

@giordano
Copy link
Contributor

So Memory does a fair bit more complicated check than Vector...

When I last looked at it, my understanding was that the only difference is that Vector can skip a memoryrefnew(mem) call inside setindex! because it has already the memory ref. The rest of the two setindex! methods looked identical to my naive eyes. In JuliaArrays/FixedSizeArrays.jl#70 (comment) I simply replaced Memory with MemoryRef and recovered the same behaviour as Vector, as I could avoid the memoryrefnew(mem) call.

@vchuravy vchuravy requested a review from vtjnash October 14, 2024 14:15
@oscardssmith
Copy link
Member

One possible approach is to remove the boundscheck from the memoryref* intrinsics and make them the responsibility of Julia.

Copy link
Member

@vtjnash vtjnash left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM. Though I also agree with Oscar that it may make sense to make this intrinsic specify that OOB access is always UB for the intrinsic itself, then apply no-ub as a function attribute around it when known to be absolutely safe?

@nsajko nsajko added the arrays [a, r, r, a, y, s] label Oct 15, 2024
@nsajko nsajko added the performance Must go faster label Dec 3, 2024
return A
end

function setindex!(A::Memory{T}, x, i1::Int, i2::Int, I::Int...) where {T}
@inline
@_propagate_inbounds_meta
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this is a silly question, but does the @inbounds if destp < srcp || destp > endp above (in unsafe_copyto!) apply @inbounds to the else branch, too? Also, shouldn't the inbounds/prop inbounds pattern be applied to getindex, too?

@giordano giordano force-pushed the vc/propagate_inbounds branch from 874bc4a to e1205b1 Compare January 21, 2025 11:22
@giordano giordano force-pushed the vc/propagate_inbounds branch from e1205b1 to c7cd406 Compare January 21, 2025 11:25
@giordano
Copy link
Contributor

It took me three attempts, but I think now I managed to rebase this PR correctly 🙂

@@ -281,7 +281,7 @@ the same manner as C.
"""
function unsafe_copyto!(dest::Array, doffs, src::Array, soffs, n)
n == 0 && return dest
unsafe_copyto!(memoryref(dest.ref, doffs), memoryref(src.ref, soffs), n)
@inbounds unsafe_copyto!(memoryref(dest.ref, doffs), memoryref(src.ref, soffs), n)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is correct.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrays [a, r, r, a, y, s] performance Must go faster
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants