Performance regression in summing over StepRange and UnitRange (no longer O(1)) #39700

KristofferC · 2021-02-17T07:13:58Z

Code to repro:

using BenchmarkTools

A = 100000:-1:1
function perf_sumcartesian(A)
    s = zero(eltype(A))
    for I in CartesianIndices(size(A))
        val = Base.unsafe_getindex(A, I)
        s += val
    end
    return s
end

@btime perf_sumcartesian($A)

On master:

  52.115 μs (0 allocations: 0 bytes)

On 1.5:

  5.850 ns (0 allocations: 0 bytes

The text was updated successfully, but these errors were encountered:

KristofferC · 2021-02-17T09:15:51Z

Bisected to 17a3c77 cc @johnnychen94

kimikage · 2021-02-17T09:29:30Z

Just FYI, the same workaround as #38086 is available. However, this issue is not the issue with SIMD ( simd_index). The @simd here only assists in loop unrolling (and/or LICM), not vectorization.

function perf_sumcartesian_simd(A)
    s = zero(eltype(A))
    @inbounds @simd for I in CartesianIndices(size(A))
        val = Base.unsafe_getindex(A, I)
        s += val
    end
    return s
end

julia> @btime perf_sumcartesian_simd($A)
  6.000 ns (0 allocations: 0 bytes)

kimikage · 2021-02-17T12:25:39Z

Changing the implementation of the iterator will change the optimization result.
However, I cannot find any problem with the current implementation of iterate in terms of inlining cost, type stability or bounds checking.

using Base.IteratorsMD: __inc
@inline function Base.iterate(iter::CartesianIndices{1}, state) # FIXME: this is just for experimentation
    next = iterate(iter.indices[1], state.I[1])
    next === nothing && return nothing
    return CartesianIndex(next[1]), CartesianIndex(next[1])
end

Edit
This is just for experimentation, but I think it might be a good idea to delegate all the calculations of range and step to the iterators.
However, since there is no guarantee that the index and state will match, each must be managed separately. That can increase the overhead in high dimensional indices.

kimikage · 2021-02-18T03:55:21Z

It does not seem to be optimized when using __is_valid_range. However, it is better to have it for safety. Iterators such as UnitRange are not range-checked, though.

julia/base/multidimensional.jl

Lines 412 to 420 in 7853ddd

    
           @inline __is_valid_range(I, rng::AbstractUnitRange) = I in rng 
        
           @inline function __is_valid_range(I, rng::OrdinalRange) 
        
               if step(rng) > 0 
        
                   lo, hi = first(rng), last(rng) 
        
               else 
        
                   lo, hi = last(rng), first(rng) 
        
               end 
        
               lo <= I <= hi 
        
           end

julia> iterate(1:10, -100)
(-99, -99)

julia> iterate(1:10, 100)
(101, 101)

Edit:
The implementation of state is not public, so I don't think we need to take care of the cases where the user breaks the state.

kimikage · 2021-02-19T06:46:27Z

This is just for experimentation, but I think it might be a good idea to delegate all the calculations of range and step to the iterators.

The implementation is quite simple. However, it is unclear whether this breaking change could be widely accepted. (Edit: Of course, I won't be proposing this for v1.6.)

cc: @johnnychen94, @timholy

Edit2:
Hold on a second. Isn't this a reinvention of ProductIterator? 😂

my original intention

import Base: iterate
using Base: tail, heads
using Base.Iterators: Reverse

# in this style,
# state = ((ind1, state1), (ind2, state2), ..., (indN, stateN))

@inline function iterate(iter::CartesianIndices)
    next = _iterate_nested_iters(iter.indices)
    next === nothing && return nothing
    return CartesianIndex(map(n -> n[1], next)), next
end
@inline function iterate(iter::CartesianIndices, state)
    next = _iterate_nested_iters(iter.indices, state)
    next === nothing && return nothing
    return CartesianIndex(map(n -> n[1], next)), next
end
# please use `reverse()` instead of `Reverse()`
@inline function iterate(r::Reverse{<:CartesianIndices}) 
    iterate(reverse(r.itr))
end
@inline function iterate(r::Reverse{<:CartesianIndices}, state)
    iterate(reverse(r.itr), state)
end


function _iterate_nested_iters(indices)
    next = iterate(indices[1])
    next === nothing && return nothing
    length(indices) == 1 && return (next, )
    tnext = _iterate_nested_iters(tail(indices))
    tnext === nothing && return nothing
    return (next, tnext...)
end
function _iterate_nested_iters(indices, state)
    next = iterate(indices[1], state[1][2])
    if next === nothing
        length(indices) == 1 && return nothing
        tnext = _iterate_nested_iters(tail(indices), tail(state))
        tnext === nothing && return nothing
        h = iterate(indices[1])
        h === nothing && return nothing
        return (h, tnext...)
    end
    return (next, tail(state)...)
end

😅

using Base.Iterators
using Base.Iterators: ProductIterator

@inline function iterate(iter::CartesianIndices, state=nothing)
    p = ProductIterator(iter.indices)
    next = state === nothing ? iterate(p) : iterate(p, state)
    next === nothing && return nothing
    return CartesianIndex(next[1]), next[2]
end

kimikage · 2021-02-19T15:15:35Z

xref: #35074, #35036, #36832

brenhinkeller · 2022-11-19T21:58:38Z

Still reproduces on 1.8.3 and 1.9.0-alpha1

adienes · 2024-01-17T22:02:30Z

I believe no longer reproduces

KristofferC added performance Must go faster regression Regression in behavior compared to a previous version labels Feb 17, 2021

KristofferC changed the title ~~Performance regression in summing over StepRange (no longer O(1))~~ Performance regression in summing over StepRange and UnitRange (no longer O(1)) Feb 17, 2021

kimikage mentioned this issue Mar 29, 2021

performance regression on high-dimensional array iteration using CartesianIndices (no simd) #38073

Open

bkamins mentioned this issue Nov 14, 2022

Spurious performance regression in Julia 1.8 vs 1.7 for @time in top-level-scope #47561

Closed

oscardssmith closed this as completed Jan 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance regression in summing over StepRange and UnitRange (no longer O(1)) #39700

Performance regression in summing over StepRange and UnitRange (no longer O(1)) #39700

KristofferC commented Feb 17, 2021

KristofferC commented Feb 17, 2021 •

edited

Loading

kimikage commented Feb 17, 2021 •

edited

Loading

kimikage commented Feb 17, 2021 •

edited

Loading

kimikage commented Feb 18, 2021 •

edited

Loading

kimikage commented Feb 19, 2021 •

edited

Loading

kimikage commented Feb 19, 2021

brenhinkeller commented Nov 19, 2022

adienes commented Jan 17, 2024

Performance regression in summing over StepRange and UnitRange (no longer O(1)) #39700

Performance regression in summing over StepRange and UnitRange (no longer O(1)) #39700

Comments

KristofferC commented Feb 17, 2021

KristofferC commented Feb 17, 2021 • edited Loading

kimikage commented Feb 17, 2021 • edited Loading

kimikage commented Feb 17, 2021 • edited Loading

kimikage commented Feb 18, 2021 • edited Loading

kimikage commented Feb 19, 2021 • edited Loading

kimikage commented Feb 19, 2021

brenhinkeller commented Nov 19, 2022

adienes commented Jan 17, 2024

KristofferC commented Feb 17, 2021 •

edited

Loading

kimikage commented Feb 17, 2021 •

edited

Loading

kimikage commented Feb 17, 2021 •

edited

Loading

kimikage commented Feb 18, 2021 •

edited

Loading

kimikage commented Feb 19, 2021 •

edited

Loading