Skip to content

Performance regression in summing over StepRange and UnitRange (no longer O(1)) #39700

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
KristofferC opened this issue Feb 17, 2021 · 8 comments
Labels
performance Must go faster regression Regression in behavior compared to a previous version

Comments

@KristofferC
Copy link
Member

Code to repro:

using BenchmarkTools

A = 100000:-1:1
function perf_sumcartesian(A)
    s = zero(eltype(A))
    for I in CartesianIndices(size(A))
        val = Base.unsafe_getindex(A, I)
        s += val
    end
    return s
end

@btime perf_sumcartesian($A)

On master:

  52.115 μs (0 allocations: 0 bytes)

On 1.5:

  5.850 ns (0 allocations: 0 bytes
@KristofferC KristofferC added performance Must go faster regression Regression in behavior compared to a previous version labels Feb 17, 2021
@KristofferC KristofferC changed the title Performance regression in summing over StepRange (no longer O(1)) Performance regression in summing over StepRange and UnitRange (no longer O(1)) Feb 17, 2021
@KristofferC
Copy link
Member Author

KristofferC commented Feb 17, 2021

Bisected to 17a3c77 cc @johnnychen94

@kimikage
Copy link
Contributor

kimikage commented Feb 17, 2021

Just FYI, the same workaround as #38086 is available. However, this issue is not the issue with SIMD ( simd_index). The @simd here only assists in loop unrolling (and/or LICM), not vectorization.

function perf_sumcartesian_simd(A)
    s = zero(eltype(A))
    @inbounds @simd for I in CartesianIndices(size(A))
        val = Base.unsafe_getindex(A, I)
        s += val
    end
    return s
end
julia> @btime perf_sumcartesian_simd($A)
  6.000 ns (0 allocations: 0 bytes)

@kimikage
Copy link
Contributor

kimikage commented Feb 17, 2021

Changing the implementation of the iterator will change the optimization result.
However, I cannot find any problem with the current implementation of iterate in terms of inlining cost, type stability or bounds checking.

using Base.IteratorsMD: __inc
@inline function Base.iterate(iter::CartesianIndices{1}, state) # FIXME: this is just for experimentation
    next = iterate(iter.indices[1], state.I[1])
    next === nothing && return nothing
    return CartesianIndex(next[1]), CartesianIndex(next[1])
end

Edit
This is just for experimentation, but I think it might be a good idea to delegate all the calculations of range and step to the iterators.
However, since there is no guarantee that the index and state will match, each must be managed separately. That can increase the overhead in high dimensional indices.

@kimikage
Copy link
Contributor

kimikage commented Feb 18, 2021

It does not seem to be optimized when using __is_valid_range. However, it is better to have it for safety. Iterators such as UnitRange are not range-checked, though.

@inline __is_valid_range(I, rng::AbstractUnitRange) = I in rng
@inline function __is_valid_range(I, rng::OrdinalRange)
if step(rng) > 0
lo, hi = first(rng), last(rng)
else
lo, hi = last(rng), first(rng)
end
lo <= I <= hi
end

julia> iterate(1:10, -100)
(-99, -99)

julia> iterate(1:10, 100)
(101, 101)

Edit:
The implementation of state is not public, so I don't think we need to take care of the cases where the user breaks the state.

@kimikage
Copy link
Contributor

kimikage commented Feb 19, 2021

This is just for experimentation, but I think it might be a good idea to delegate all the calculations of range and step to the iterators.

The implementation is quite simple. However, it is unclear whether this breaking change could be widely accepted. (Edit: Of course, I won't be proposing this for v1.6.)

cc: @johnnychen94, @timholy

Edit2:
Hold on a second. Isn't this a reinvention of ProductIterator? 😂

my original intention
import Base: iterate
using Base: tail, heads
using Base.Iterators: Reverse

# in this style,
# state = ((ind1, state1), (ind2, state2), ..., (indN, stateN))

@inline function iterate(iter::CartesianIndices)
    next = _iterate_nested_iters(iter.indices)
    next === nothing && return nothing
    return CartesianIndex(map(n -> n[1], next)), next
end
@inline function iterate(iter::CartesianIndices, state)
    next = _iterate_nested_iters(iter.indices, state)
    next === nothing && return nothing
    return CartesianIndex(map(n -> n[1], next)), next
end
# please use `reverse()` instead of `Reverse()`
@inline function iterate(r::Reverse{<:CartesianIndices}) 
    iterate(reverse(r.itr))
end
@inline function iterate(r::Reverse{<:CartesianIndices}, state)
    iterate(reverse(r.itr), state)
end


function _iterate_nested_iters(indices)
    next = iterate(indices[1])
    next === nothing && return nothing
    length(indices) == 1 && return (next, )
    tnext = _iterate_nested_iters(tail(indices))
    tnext === nothing && return nothing
    return (next, tnext...)
end
function _iterate_nested_iters(indices, state)
    next = iterate(indices[1], state[1][2])
    if next === nothing
        length(indices) == 1 && return nothing
        tnext = _iterate_nested_iters(tail(indices), tail(state))
        tnext === nothing && return nothing
        h = iterate(indices[1])
        h === nothing && return nothing
        return (h, tnext...)
    end
    return (next, tail(state)...)
end

😅

using Base.Iterators
using Base.Iterators: ProductIterator

@inline function iterate(iter::CartesianIndices, state=nothing)
    p = ProductIterator(iter.indices)
    next = state === nothing ? iterate(p) : iterate(p, state)
    next === nothing && return nothing
    return CartesianIndex(next[1]), next[2]
end

@kimikage
Copy link
Contributor

xref: #35074, #35036, #36832

@brenhinkeller
Copy link
Contributor

Still reproduces on 1.8.3 and 1.9.0-alpha1

@adienes
Copy link
Member

adienes commented Jan 17, 2024

I believe no longer reproduces

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Must go faster regression Regression in behavior compared to a previous version
Projects
None yet
Development

No branches or pull requests

5 participants