-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: a more principled take on dimensional reduction inits #55318
base: master
Are you sure you want to change the base?
Conversation
test/reducedim.jl
Outdated
@@ -209,9 +209,9 @@ end | |||
|
|||
for f in (minimum, maximum) | |||
@test_throws "reducing over an empty collection is not allowed" f(A, dims=1) | |||
@test isequal(f(A, dims=2), zeros(Int, 0, 1)) | |||
@test_broken isequal(f(A, dims=2), zeros(Int, 0, 1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I marked these as broken as a reminder to myself; I'm not entirely sure that we want/need to preserve these behaviors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is now split out into #55628.
|
I think I've addressed the most egregious of the performance issues, but I've only done spot tests so far. Now I need to tackle those stdlibs. |
base/reducedim.jl
Outdated
# * One value: Fill with `mapreduce_first` & return | ||
# * Two+ values: Fill with `op(f(a1), f(a2))`, proceed <--- this is the hard case: | ||
# * Proceeding involves skipping the first two inner iterations | ||
_mapreduce_dim(::Type{F}, op::G, ::_InitialValue, A::AbstractArrayOrBroadcasted, dims) where {F, G} = _mapreduce_dim(x->F(x), op, Base._InitialValue(), A, dims) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I needed this to get maximum(Float64, reshape(1:4, 2, :); dims = 2)
to infer. Something something specialization heuristics.
@@ -610,9 +610,10 @@ function unordered_test_for_extrema(a; dims_test = ((), 1, 2, (1,2), 3)) | |||
for dims in dims_test | |||
vext = extrema(a; dims) | |||
vmin, vmax = minimum(a; dims), maximum(a; dims) | |||
@test isequal(extrema!(copy(vext), a), vext) | |||
@test isequal(extrema!(similar(vext, NTuple{2, eltype(a)}), a), vext) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This fails as was originally written because this is reducing over a Union{T, Missing}
and depending upon the values in a
, it vext
might incrementally widen to a T[]
or a Missing[]
or a Union{T, Missing}
... which then isn't able to be used in general. This would be representative of a class of failure that this PR might introduce in the ecosystem.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
the custom allocators are needed for Sparse Arrays
end | ||
else | ||
# Take care of the smallest cartesian "rectangle" that includes our two peeled elements | ||
small_inner, large_inner = _split2(inner1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This optimization is not valid if _split2
decides not to split the last reduced dimensions, as
julia> I1 = collect(Iterators.flatten([CartesianIndices((1:2, 10)), CartesianIndices((3:10, 10))]));
julia> I2 = collect(Iterators.flatten([CartesianIndices((10, 1:1)), CartesianIndices((10, 2:10))]));
julia> I1 == I2
false
julia> sort(I1) == I2
true
A better solution might be skipping the first 2 element manually, e.g.
indsAt, indsRt = safe_tail(Aaxs), safe_tail(raxs)
keep, Idefault = Broadcast.shapeindexer(indsRt)
skip = mergeindices(is_outer_dim, a_inds, i1:i2)
skipt = Base.IteratorsMD.split(skip, Val(1))[2]
if length(raxs[1]) == 1 && length(Aaxs[1]) > 1 #reducedim1(R, A)
i1 = first(raxs[1])
@inbounds for IA in CartesianIndices(indsAt)
IR = Broadcast.newindex(IA, keep, Idefault)
r = R[i1,IR]
offset = IA in skipt ? 2 : 0
@simd for i in Aaxs[1][begin + offset:end]
r = op(r, f(A[i, IA]))
end
R[i1,IR] = r
end
else
@inbounds for IA in CartesianIndices(indsAt)
IA in skipt && continue
IR = Broadcast.newindex(IA, keep, Idefault)
@simd for i in Aaxs[1]
R[i,IR] = op(R[i,IR], f(A[i,IA]))
end
end
end
for i in Iterators.drop(outer_inds, 1) | ||
innerj = mergeindices(is_outer_dim, i, a_inds) | ||
j1, j2 = innerj | ||
@inbounds c1, c2 = A[j1], A[j2] | ||
@inbounds R[i] = _mapreduce_start(f, op, init, c1, c2) | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This loop might need manual inner loop expansion, as it blocks possible vectorization if there's no reduction on dim1. Another solution is adding @simd
support to drop(::CartesianIndices)
https://github.com/N5N3/julia/blob/fd1b3a19770d4948451c2d32cf894ea1b9536038/base/multidimensional.jl#L703-L716
As part of the work towards #52397 and #53945, this drastically simplifies the initialization of dimensional reductions and is taking steps towards a unification of the (non-dimensional)
mapreduce
with thedims
flavor.This completely removes all guesswork on initial values for the dimensional reductions. There are 2*3 possible ways we should start a reduction, depending upon how many values are in the reduction and if there's an init:
mapreduce_empty
(usually an error, sometimes an unambiguous value like0.0
)a
):op(init, f(a))
. Otherwise:mapreduce_first
(typicallyf(a)
)a1
,a2
):op(op(init, f(a1)), f(a2))
and continue. Otherwise:op(f(a1), f(a2))
and continueWhen looking at dimensional reductions, the above are all arrays of multiple such values. Which means we need to get an
eltype
. This uses the builtin incremental widening on every single update to theR
vector. This wouldn't normally be so tricky — it's just two nestedfor
loops. The trouble is that the naive double for loop is leaving 100x+ performance on the table when it doesn't iterate in a cache-friendly manner. So you might indeed want to update a single value inR
multiple times — and possibly widen as you do so.It's so very tempting to do the standard Julian
mapreduce(f, op, A, dims) = mapreduce!(f, op, init(f, op, A, dims), A)
idiom. But that simply doesn't work because there's not a single way to initialize the beginning of a reduction. There are SIX. But then we add support for pre-allocated result vectors and you even moreso want to be able to just write, e.g.,sum!(r, A) = mapreduce!(identity, +, first_step!(identity, +, r, A), A)
. But at that point you've already done the first step, so you need to be able to continue the reduction from an arbitrary point in the iteration.This is a WIP because I
expect it to have some fairly significant performance regressions at the moment. Some may be easy to recover, others may be harderneed to fix the stdlib failures and then let nanosoldier loose.Closes: #45566, closes #47231, closes #55213, closes #31427, closes #41054, closes #54920, closes #42259, closes #38660, closes #21097, closes #32366, closes #54875, closes #43731.
Addresses half of #45562.