Skip to content

Faster iteration for an IdOffsetRange #215

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jul 11, 2021

Conversation

jishnub
Copy link
Member

@jishnub jishnub commented Mar 1, 2021

Fixes #214

This was not an issue with bounds-checking, but with iterating over an IdOffsetRange. Instead we may convert it to a UnitRange while iterating.

After this:

julia> x = zeros(Int, 10_000_000);

julia> y = OffsetVector(x, 0);

julia> fill1d(x) = for i in axes(x,1); x[i] = i; end
fill1d (generic function with 1 method)

julia> @btime fill1d($x);
  11.666 ms (0 allocations: 0 bytes)

julia> @btime fill1d($y);
  11.884 ms (0 allocations: 0 bytes)

@codecov
Copy link

codecov bot commented Mar 1, 2021

Codecov Report

Merging #215 (b69dab9) into master (f5cd050) will not change coverage.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #215   +/-   ##
=======================================
  Coverage   95.28%   95.28%           
=======================================
  Files           5        5           
  Lines         424      424           
=======================================
  Hits          404      404           
  Misses         20       20           
Impacted Files Coverage Δ
src/axes.jl 100.00% <100.00%> (ø)
src/utils.jl 97.82% <100.00%> (+0.09%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f5cd050...b69dab9. Read the comment docs.

@timholy
Copy link
Member

timholy commented Mar 1, 2021

It seems that there's a failure to automatically elide the bounds-checking. I'd rather figure out why.

One concern with solutions like this is that it potentially changes behavior. What if you can't coerce to a UnitRange? What if you define an AbstractRange type with side effects?

tim@diva:/tmp$ juliam --startup-file=no -q
julia> struct MyUR <: AbstractUnitRange{Int} end

julia> Base.iterate(::MyUR) = (println("starting"); return (1, false))

julia> Base.iterate(::MyUR, isdone::Bool) = (println("continuing"); return isdone ? nothing : (2, true))

julia> Base.first(::MyUR) = 1

julia> Base.last(::MyUR) = 2

julia> r = MyUR();

julia> for i in r
           println(i)
       end
starting
1
continuing
2
continuing

julia> UnitRange(r)
1:2

so UnitRange(r) destroys any special behavior of the parent iterator.

@jishnub
Copy link
Member Author

jishnub commented Mar 2, 2021

The way I look at this is: IdOffsetRange is an internal type to OffsetArrays, and the package is free to define how to iterate over it to suit its goals. A wrapper type is not expected to have a one-to-one correspondence with the parent, and in this case the correspondence is in values and axes, and not necessarily in behavior. If one desires the iteration behavior of MyUR, they may iterate over the parent or construct an appropriate type.

Regarding the possibility of coercion to a UnitRange: this requires that the parent defines its first and last values, which should be common for AbstractUnitRanges. In any case this is not strictly necessary, we may use the fact that IdOffsetRange has a unit step to define the iteration.

Although I do agree that tracing down why the bounds check is not being elided would be the ideal solution. I'm not sure how to do that though.

@jishnub
Copy link
Member Author

jishnub commented Mar 4, 2021

For reasons that I don't fully understand, bounds checking and populating a 2d array is faster if we coerce the axes to UnitRanges before iterating over them.

On master:

julia> x2d = zeros(Int, 4000, 4000);

julia> y2d = OffsetArray(x2d, 0, 0);

julia> fill2d(x) = for j in axes(x,2); for i in axes(x,1); x[i,j] = i + j; end; end
fill2d (generic function with 1 method)

julia> @btime fill2d($x2d);
  17.113 ms (0 allocations: 0 bytes)

julia> @btime fill2d($y2d);
  31.932 ms (0 allocations: 0 bytes)

Using e8e184c:

julia> @btime fill2d($y2d);
  19.826 ms (0 allocations: 0 bytes)

After c3e04b5

julia> @btime fill2d($y2d);
  17.588 ms (0 allocations: 0 bytes)

@timholy Could you investigate this performance issue when you have time? This is a significant improvement that's possible.

@timholy
Copy link
Member

timholy commented Mar 4, 2021

I will try to get to this within a week or so. This kind of thing is tricky, no doubt about it.

@johnnychen94
Copy link
Member

johnnychen94 commented Mar 18, 2021

Surprisingly, by changing e8e184c to

- Base.iterate(r::IdOffsetRange) = isempty(r) ? nothing : (first(r), first(r))
+ @inline function Base.iterate(r::IdOffsetRange)
+   if isempty(r)
+       return nothing
+   else
+       next = first(r)
+       return (next, next)
+   end
+ end

I get

julia> @btime fill2d($x2d);
  7.454 ms (0 allocations: 0 bytes)

julia> @btime fill2d($y2d);
  7.685 ms (0 allocations: 0 bytes)
  7.644 ms (0 allocations: 0 bytes) # After c3e04b5
  10.894 ms (0 allocations: 0 bytes) # Using e8e184c

I guess the still existing 0.3ms gap lies in last(r), where the extra computation seems unavoidable if we still want to keep generality.

One concern with solutions like this is that it potentially changes behavior. What if you can't coerce to a UnitRange? What if you define an AbstractRange type with side effects?

I feel it be okay to specialize the method for common use cases that can be converted to a UnitRange. By saying this, I'm imagining something like:

@inline Base.iterate(r::IdOffsetRange{T, I}, i...) where {T, I} = _iterate(T, r, i...)

_iterate(::Type{<:UnitRange}, r, i...) = iterate(UnitRange(r), i...)
@inline function _iterate(::Type, r::IdOffsetRange)
    if isempty(r)
        return nothing
    else
        next = first(r)
        return (next, next)
    end
end
@inline function _iterate(::Type, r::IdOffsetRange, this)
    this == last(r) && return nothing
    next = this + one(this)
    (next, next)
end

But I'm really unsure if this worths it.

@jishnub
Copy link
Member Author

jishnub commented Mar 18, 2021

The common parent range is perhaps Base.OneTo and not UnitRange. In essence this is equivalent to an UnitRange, so such an approach makes sense. However it'll be better if this can be solved generally.

@jishnub
Copy link
Member Author

jishnub commented Apr 11, 2021

I've added a specialized method for Base.OneTo but this is not really ideal

@jishnub
Copy link
Member Author

jishnub commented Jul 5, 2021

With the iteration defined as it is now

julia> dim = 1000
1000

julia> x2d = Array{Float64}(undef, 2dim, 2dim);

julia> fill2d(x) = for j in axes(x,2); for i in axes(x,1); x[i,j] = i + j; end; end
fill2d (generic function with 1 method)

julia> @btime fill2d($x2d);
  5.247 ms (0 allocations: 0 bytes)

julia> @btime fill2d($(OffsetArray(x2d)));
  4.748 ms (0 allocations: 0 bytes)

So wrapping an OffsetArray actually makes the function faster. This is unusual and other functions in the benchmark don't behave this way, however this is interesting indeed.

@johnnychen94
Copy link
Member

Yep, I've also noticed this in JuliaLang/julia#38073 (comment)

@jishnub
Copy link
Member Author

jishnub commented Jul 5, 2021

Good to merge this? There's an evident performance boost for OneTo and no change to the general case

Copy link
Member

@johnnychen94 johnnychen94 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change makes sense to me, but tweaking the performance is a little bit tricky and I didn't really look into it. Feel free to merge when you're confident.

src/axes.jl Outdated
# and has no specialized iteration defined for it,
# so we may add the offset to the range directly and iterate over the result
# This gets around the performance issue described in issue #214
@inline _iterate(r::IdOffsetRange{<:Integer, <:Base.OneTo}, i...) = iterate(r.parent .+ r.offset, i...)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure of it, r.parent .+ r.offset is a little bit tricky to me, maybe manually expand it?

Suggested change
@inline _iterate(r::IdOffsetRange{<:Integer, <:Base.OneTo}, i...) = iterate(r.parent .+ r.offset, i...)
@inline function _iterate(r::IdOffsetRange{T, <:Base.OneTo}, i...) where T
r0 = UnitRange{T}(r.offset + 1, r.offset + r.stop)
iterate(r0, i...)
end

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OneTo has special broadcasting methods defined to cover this, so I don't think this will make much of a difference. Are you concerned about the behavior changing?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is expected to be identical, but the current improvement is perhaps a compiler trick so I'm wondering if a manual inline/expand helps stabilize this optimization.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. I'll add some util functions that might come in useful at other points as well.

@jishnub jishnub merged commit 5ee74cf into JuliaArrays:master Jul 11, 2021
@jishnub jishnub deleted the IOR_iterate branch August 18, 2021 08:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bounds-checking overhead for an IdOffsetRange
3 participants