-
Notifications
You must be signed in to change notification settings - Fork 46
Faster iteration for an IdOffsetRange #215
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## master #215 +/- ##
=======================================
Coverage 95.28% 95.28%
=======================================
Files 5 5
Lines 424 424
=======================================
Hits 404 404
Misses 20 20
Continue to review full report at Codecov.
|
It seems that there's a failure to automatically elide the bounds-checking. I'd rather figure out why. One concern with solutions like this is that it potentially changes behavior. What if you can't coerce to a tim@diva:/tmp$ juliam --startup-file=no -q
julia> struct MyUR <: AbstractUnitRange{Int} end
julia> Base.iterate(::MyUR) = (println("starting"); return (1, false))
julia> Base.iterate(::MyUR, isdone::Bool) = (println("continuing"); return isdone ? nothing : (2, true))
julia> Base.first(::MyUR) = 1
julia> Base.last(::MyUR) = 2
julia> r = MyUR();
julia> for i in r
println(i)
end
starting
1
continuing
2
continuing
julia> UnitRange(r)
1:2 so |
The way I look at this is: Regarding the possibility of coercion to a Although I do agree that tracing down why the bounds check is not being elided would be the ideal solution. I'm not sure how to do that though. |
For reasons that I don't fully understand, bounds checking and populating a 2d array is faster if we coerce the axes to On master: julia> x2d = zeros(Int, 4000, 4000);
julia> y2d = OffsetArray(x2d, 0, 0);
julia> fill2d(x) = for j in axes(x,2); for i in axes(x,1); x[i,j] = i + j; end; end
fill2d (generic function with 1 method)
julia> @btime fill2d($x2d);
17.113 ms (0 allocations: 0 bytes)
julia> @btime fill2d($y2d);
31.932 ms (0 allocations: 0 bytes) Using e8e184c: julia> @btime fill2d($y2d);
19.826 ms (0 allocations: 0 bytes) After c3e04b5 julia> @btime fill2d($y2d);
17.588 ms (0 allocations: 0 bytes) @timholy Could you investigate this performance issue when you have time? This is a significant improvement that's possible. |
I will try to get to this within a week or so. This kind of thing is tricky, no doubt about it. |
Surprisingly, by changing - Base.iterate(r::IdOffsetRange) = isempty(r) ? nothing : (first(r), first(r))
+ @inline function Base.iterate(r::IdOffsetRange)
+ if isempty(r)
+ return nothing
+ else
+ next = first(r)
+ return (next, next)
+ end
+ end I get julia> @btime fill2d($x2d);
7.454 ms (0 allocations: 0 bytes)
julia> @btime fill2d($y2d);
7.685 ms (0 allocations: 0 bytes)
7.644 ms (0 allocations: 0 bytes) # After c3e04b5
10.894 ms (0 allocations: 0 bytes) # Using e8e184c I guess the still existing 0.3ms gap lies in
I feel it be okay to specialize the method for common use cases that can be converted to a UnitRange. By saying this, I'm imagining something like: @inline Base.iterate(r::IdOffsetRange{T, I}, i...) where {T, I} = _iterate(T, r, i...)
_iterate(::Type{<:UnitRange}, r, i...) = iterate(UnitRange(r), i...)
@inline function _iterate(::Type, r::IdOffsetRange)
if isempty(r)
return nothing
else
next = first(r)
return (next, next)
end
end
@inline function _iterate(::Type, r::IdOffsetRange, this)
this == last(r) && return nothing
next = this + one(this)
(next, next)
end But I'm really unsure if this worths it. |
The common parent range is perhaps |
I've added a specialized method for |
With the iteration defined as it is now julia> dim = 1000
1000
julia> x2d = Array{Float64}(undef, 2dim, 2dim);
julia> fill2d(x) = for j in axes(x,2); for i in axes(x,1); x[i,j] = i + j; end; end
fill2d (generic function with 1 method)
julia> @btime fill2d($x2d);
5.247 ms (0 allocations: 0 bytes)
julia> @btime fill2d($(OffsetArray(x2d)));
4.748 ms (0 allocations: 0 bytes) So wrapping an |
Yep, I've also noticed this in JuliaLang/julia#38073 (comment) |
Good to merge this? There's an evident performance boost for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change makes sense to me, but tweaking the performance is a little bit tricky and I didn't really look into it. Feel free to merge when you're confident.
src/axes.jl
Outdated
# and has no specialized iteration defined for it, | ||
# so we may add the offset to the range directly and iterate over the result | ||
# This gets around the performance issue described in issue #214 | ||
@inline _iterate(r::IdOffsetRange{<:Integer, <:Base.OneTo}, i...) = iterate(r.parent .+ r.offset, i...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure of it, r.parent .+ r.offset
is a little bit tricky to me, maybe manually expand it?
@inline _iterate(r::IdOffsetRange{<:Integer, <:Base.OneTo}, i...) = iterate(r.parent .+ r.offset, i...) | |
@inline function _iterate(r::IdOffsetRange{T, <:Base.OneTo}, i...) where T | |
r0 = UnitRange{T}(r.offset + 1, r.offset + r.stop) | |
iterate(r0, i...) | |
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OneTo
has special broadcasting methods defined to cover this, so I don't think this will make much of a difference. Are you concerned about the behavior changing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this is expected to be identical, but the current improvement is perhaps a compiler trick so I'm wondering if a manual inline/expand helps stabilize this optimization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. I'll add some util functions that might come in useful at other points as well.
Fixes #214
This was not an issue with bounds-checking, but with iterating over an
IdOffsetRange
. Instead we may convert it to aUnitRange
while iterating.After this: