Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: a more principled take on dimensional reduction inits #55318

Open
wants to merge 18 commits into
base: master
Choose a base branch
from

Conversation

mbauman
Copy link
Member

@mbauman mbauman commented Jul 31, 2024

As part of the work towards #52397 and #53945, this drastically simplifies the initialization of dimensional reductions and is taking steps towards a unification of the (non-dimensional) mapreduce with the dims flavor.

This completely removes all guesswork on initial values for the dimensional reductions. There are 2*3 possible ways we should start a reduction, depending upon how many values are in the reduction and if there's an init:

  • 0 values:
    • Have init? Return it. Otherwise:
    • Return mapreduce_empty (usually an error, sometimes an unambiguous value like 0.0)
  • 1 value (a):
    • Have init? Return op(init, f(a)). Otherwise:
    • Return mapreduce_first (typically f(a))
  • 2+ values (a1, a2):
    • Have init? Start every chain with op(op(init, f(a1)), f(a2)) and continue. Otherwise:
    • Start every chain with op(f(a1), f(a2)) and continue

When looking at dimensional reductions, the above are all arrays of multiple such values. Which means we need to get an eltype. This uses the builtin incremental widening on every single update to the R vector. This wouldn't normally be so tricky — it's just two nested for loops. The trouble is that the naive double for loop is leaving 100x+ performance on the table when it doesn't iterate in a cache-friendly manner. So you might indeed want to update a single value in R multiple times — and possibly widen as you do so.

It's so very tempting to do the standard Julian mapreduce(f, op, A, dims) = mapreduce!(f, op, init(f, op, A, dims), A) idiom. But that simply doesn't work because there's not a single way to initialize the beginning of a reduction. There are SIX. But then we add support for pre-allocated result vectors and you even moreso want to be able to just write, e.g., sum!(r, A) = mapreduce!(identity, +, first_step!(identity, +, r, A), A). But at that point you've already done the first step, so you need to be able to continue the reduction from an arbitrary point in the iteration.

This is a WIP because I expect it to have some fairly significant performance regressions at the moment. Some may be easy to recover, others may be harder need to fix the stdlib failures and then let nanosoldier loose.

Closes: #45566, closes #47231, closes #55213, closes #31427, closes #41054, closes #54920, closes #42259, closes #38660, closes #21097, closes #32366, closes #54875, closes #43731.

Addresses half of #45562.

@mbauman mbauman added the fold sum, maximum, reduce, foldl, etc. label Jul 31, 2024
@@ -209,9 +209,9 @@ end

for f in (minimum, maximum)
@test_throws "reducing over an empty collection is not allowed" f(A, dims=1)
@test isequal(f(A, dims=2), zeros(Int, 0, 1))
@test_broken isequal(f(A, dims=2), zeros(Int, 0, 1))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I marked these as broken as a reminder to myself; I'm not entirely sure that we want/need to preserve these behaviors.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now split out into #55628.

@mbauman
Copy link
Member Author

mbauman commented Jul 31, 2024

Looks like this is introducing a regression that's adjacent to #32366 (comment). sum([1 0 missing], dims=2) works on master but is currently broken here (and I don't think is tested). Now fixed and tested.

@mbauman
Copy link
Member Author

mbauman commented Jul 31, 2024

I think I've addressed the most egregious of the performance issues, but I've only done spot tests so far. Now I need to tackle those stdlibs.

# * One value: Fill with `mapreduce_first` & return
# * Two+ values: Fill with `op(f(a1), f(a2))`, proceed <--- this is the hard case:
# * Proceeding involves skipping the first two inner iterations
_mapreduce_dim(::Type{F}, op::G, ::_InitialValue, A::AbstractArrayOrBroadcasted, dims) where {F, G} = _mapreduce_dim(x->F(x), op, Base._InitialValue(), A, dims)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I needed this to get maximum(Float64, reshape(1:4, 2, :); dims = 2) to infer. Something something specialization heuristics.

@@ -610,9 +610,10 @@ function unordered_test_for_extrema(a; dims_test = ((), 1, 2, (1,2), 3))
for dims in dims_test
vext = extrema(a; dims)
vmin, vmax = minimum(a; dims), maximum(a; dims)
@test isequal(extrema!(copy(vext), a), vext)
@test isequal(extrema!(similar(vext, NTuple{2, eltype(a)}), a), vext)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fails as was originally written because this is reducing over a Union{T, Missing} and depending upon the values in a, it vext might incrementally widen to a T[] or a Missing[] or a Union{T, Missing}... which then isn't able to be used in general. This would be representative of a class of failure that this PR might introduce in the ecosystem.

@tecosaur

This comment was marked as off-topic.

@mbauman

This comment was marked as off-topic.

@tecosaur

This comment was marked as off-topic.

the custom allocators are needed for Sparse Arrays
end
else
# Take care of the smallest cartesian "rectangle" that includes our two peeled elements
small_inner, large_inner = _split2(inner1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This optimization is not valid if _split2 decides not to split the last reduced dimensions, as

julia> I1 = collect(Iterators.flatten([CartesianIndices((1:2, 10)), CartesianIndices((3:10, 10))]));

julia> I2 = collect(Iterators.flatten([CartesianIndices((10, 1:1)), CartesianIndices((10, 2:10))]));

julia> I1 == I2
false

julia> sort(I1) == I2
true

A better solution might be skipping the first 2 element manually, e.g.

        indsAt, indsRt = safe_tail(Aaxs), safe_tail(raxs)
        keep, Idefault = Broadcast.shapeindexer(indsRt)
        skip = mergeindices(is_outer_dim, a_inds, i1:i2)
        skipt = Base.IteratorsMD.split(skip, Val(1))[2]
        if length(raxs[1]) == 1 && length(Aaxs[1]) > 1 #reducedim1(R, A)
            i1 = first(raxs[1])
            @inbounds for IA in CartesianIndices(indsAt)
                IR = Broadcast.newindex(IA, keep, Idefault)
                r = R[i1,IR]
                offset = IA in skipt ? 2 : 0
                @simd for i in Aaxs[1][begin + offset:end]
                    r = op(r, f(A[i, IA]))
                end
                R[i1,IR] = r
            end
        else
            @inbounds for IA in CartesianIndices(indsAt)
                IA in skipt && continue
                IR = Broadcast.newindex(IA, keep, Idefault)
                @simd for i in Aaxs[1]
                    R[i,IR] = op(R[i,IR], f(A[i,IA]))
                end
            end
        end

Comment on lines +218 to +223
for i in Iterators.drop(outer_inds, 1)
innerj = mergeindices(is_outer_dim, i, a_inds)
j1, j2 = innerj
@inbounds c1, c2 = A[j1], A[j2]
@inbounds R[i] = _mapreduce_start(f, op, init, c1, c2)
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This loop might need manual inner loop expansion, as it blocks possible vectorization if there's no reduction on dim1. Another solution is adding @simd support to drop(::CartesianIndices) https://github.com/N5N3/julia/blob/fd1b3a19770d4948451c2d32cf894ea1b9536038/base/multidimensional.jl#L703-L716

@mbauman mbauman added the help wanted Indicates that a maintainer wants help on an issue or pull request label Oct 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment