WIP: a more principled take on dimensional reduction inits #55318

mbauman · 2024-07-31T03:45:53Z

As part of the work towards #52397 and #53945, this drastically simplifies the initialization of dimensional reductions and is taking steps towards a unification of the (non-dimensional) mapreduce with the dims flavor.

This completely removes all guesswork on initial values for the dimensional reductions. There are 2*3 possible ways we should start a reduction, depending upon how many values are in the reduction and if there's an init:

0 values:
- Have init? Return it. Otherwise:
- Return mapreduce_empty (usually an error, sometimes an unambiguous value like 0.0)
1 value (a):
- Have init? Return op(init, f(a)). Otherwise:
- Return mapreduce_first (typically f(a))
2+ values (a1, a2):
- Have init? Start every chain with op(op(init, f(a1)), f(a2)) and continue. Otherwise:
- Start every chain with op(f(a1), f(a2)) and continue

When looking at dimensional reductions, the above are all arrays of multiple such values. Which means we need to get an eltype. This uses the builtin incremental widening on every single update to the R vector. This wouldn't normally be so tricky — it's just two nested for loops. The trouble is that the naive double for loop is leaving 100x+ performance on the table when it doesn't iterate in a cache-friendly manner. So you might indeed want to update a single value in R multiple times — and possibly widen as you do so.

It's so very tempting to do the standard Julian mapreduce(f, op, A, dims) = mapreduce!(f, op, init(f, op, A, dims), A) idiom. But that simply doesn't work because there's not a single way to initialize the beginning of a reduction. There are SIX. But then we add support for pre-allocated result vectors and you even moreso want to be able to just write, e.g., sum!(r, A) = mapreduce!(identity, +, first_step!(identity, +, r, A), A). But at that point you've already done the first step, so you need to be able to continue the reduction from an arbitrary point in the iteration.

This is a WIP because I ~~expect it to have some fairly significant performance regressions at the moment. Some may be easy to recover, others may be harder~~ need to fix the stdlib failures and then let nanosoldier loose.

Closes: #45566, closes #47231, closes #55213, closes #31427, closes #41054, closes #54920, closes #42259, closes #38660, closes #21097, closes #32366, closes #54875, closes #43731.

Addresses half of #45562.

mbauman · 2024-07-31T03:50:21Z

test/reducedim.jl

@@ -209,9 +209,9 @@ end

    for f in (minimum, maximum)
        @test_throws "reducing over an empty collection is not allowed" f(A, dims=1)
-        @test isequal(f(A, dims=2), zeros(Int, 0, 1))
+        @test_broken isequal(f(A, dims=2), zeros(Int, 0, 1))


I marked these as broken as a reminder to myself; I'm not entirely sure that we want/need to preserve these behaviors.

This is now split out into #55628.

mbauman · 2024-07-31T04:28:39Z

~~Looks like this is introducing a regression that's adjacent to #32366 (comment). sum([1 0 missing], dims=2) works on master but is currently broken here (and I don't think is tested).~~ Now fixed and tested.

mbauman · 2024-07-31T21:58:32Z

I think I've addressed the most egregious of the performance issues, but I've only done spot tests so far. Now I need to tackle those stdlibs.

mbauman · 2024-08-01T22:37:16Z

base/reducedim.jl

+#   * One value: Fill with `mapreduce_first` & return
+#   * Two+ values: Fill with `op(f(a1), f(a2))`, proceed <--- this is the hard case:
+#       * Proceeding involves skipping the first two inner iterations
+_mapreduce_dim(::Type{F}, op::G, ::_InitialValue, A::AbstractArrayOrBroadcasted, dims) where {F, G} = _mapreduce_dim(x->F(x), op, Base._InitialValue(), A, dims)


I needed this to get maximum(Float64, reshape(1:4, 2, :); dims = 2) to infer. Something something specialization heuristics.

mbauman · 2024-08-01T22:40:10Z

test/reducedim.jl

@@ -610,9 +610,10 @@ function unordered_test_for_extrema(a; dims_test = ((), 1, 2, (1,2), 3))
    for dims in dims_test
        vext = extrema(a; dims)
        vmin, vmax = minimum(a; dims), maximum(a; dims)
-        @test isequal(extrema!(copy(vext), a), vext)
+        @test isequal(extrema!(similar(vext, NTuple{2, eltype(a)}), a), vext)


This fails as was originally written because this is reducing over a Union{T, Missing} and depending upon the values in a, it vext might incrementally widen to a T[] or a Missing[] or a Union{T, Missing}... which then isn't able to be used in general. This would be representative of a class of failure that this PR might introduce in the ecosystem.

the custom allocators are needed for Sparse Arrays

N5N3 · 2024-08-30T19:41:19Z

base/reducedim.jl

+        end
+    else
+        # Take care of the smallest cartesian "rectangle" that includes our two peeled elements
+        small_inner, large_inner = _split2(inner1)


This optimization is not valid if _split2 decides not to split the last reduced dimensions, as

julia> I1 = collect(Iterators.flatten([CartesianIndices((1:2, 10)), CartesianIndices((3:10, 10))])); julia> I2 = collect(Iterators.flatten([CartesianIndices((10, 1:1)), CartesianIndices((10, 2:10))])); julia> I1 == I2 false julia> sort(I1) == I2 true

A better solution might be skipping the first 2 element manually, e.g.

indsAt, indsRt = safe_tail(Aaxs), safe_tail(raxs) keep, Idefault = Broadcast.shapeindexer(indsRt) skip = mergeindices(is_outer_dim, a_inds, i1:i2) skipt = Base.IteratorsMD.split(skip, Val(1))[2] if length(raxs[1]) == 1 && length(Aaxs[1]) > 1 #reducedim1(R, A) i1 = first(raxs[1]) @inbounds for IA in CartesianIndices(indsAt) IR = Broadcast.newindex(IA, keep, Idefault) r = R[i1,IR] offset = IA in skipt ? 2 : 0 @simd for i in Aaxs[1][begin + offset:end] r = op(r, f(A[i, IA])) end R[i1,IR] = r end else @inbounds for IA in CartesianIndices(indsAt) IA in skipt && continue IR = Broadcast.newindex(IA, keep, Idefault) @simd for i in Aaxs[1] R[i,IR] = op(R[i,IR], f(A[i,IA])) end end end

N5N3 · 2024-08-30T21:11:02Z

base/reducedim.jl

+    for i in Iterators.drop(outer_inds, 1)
+        innerj = mergeindices(is_outer_dim, i, a_inds)
+        j1, j2 = innerj
+        @inbounds c1, c2 = A[j1], A[j2]
+        @inbounds R[i] = _mapreduce_start(f, op, init, c1, c2)
+    end


This loop might need manual inner loop expansion, as it blocks possible vectorization if there's no reduction on dim1. Another solution is adding @simd support to drop(::CartesianIndices) https://github.com/N5N3/julia/blob/fd1b3a19770d4948451c2d32cf894ea1b9536038/base/multidimensional.jl#L703-L716

adienes · 2024-12-04T15:07:44Z

does the "help wanted" here refer to performance optimizations? or are there other, deeper, darker problems. that is --- would it be accurate to say that this PR implements all the desired semantics and causes a few benchmark regressions?

mbauman · 2024-12-04T15:33:21Z

It's all of the above. This needs:

Work to get the stdlibs passing tests
Running of PkgEval and examining the results
Fixing the issues @N5N3 flagged
Running benchmarks and evaluating the impact

adienes · 2024-12-21T22:36:51Z

I have some proposed updates in this branch here adienes#1

what would be the recommend way to merge (or at least compare) to this pr? should I draft a PR directly onto mbauman:mb/mapreducedim-init ?

a more principled take on dimensional reduction inits

b9bdd63

mbauman added the fold sum, maximum, reduce, foldl, etc. label Jul 31, 2024

mbauman commented Jul 31, 2024

View reviewed changes

mbauman added 2 commits July 31, 2024 00:10

add support for broadcasteds

2b000b1

add tests for complex reductions

07c1da1

mbauman and others added 7 commits July 31, 2024 00:41

add tests for bitwise operators on integers

2179df3

fixup! add tests for bitwise operators on integers

344ed43

restore performance for linear fast dim=1+

8eb7d91

fixup! add support for broadcasteds

11e98ed

add more tests

c32c6ed

perf: add loop switching

58d2b27

perf: slice up the slow part

2d595bb

mbauman mentioned this pull request Jul 31, 2024

avoid intermediate map allocations in multi-arg mapreduce #55301

Draft

use builtin incremental widening slowly when needed

b7adc89

mbauman commented Aug 1, 2024

View reviewed changes

mbauman added 4 commits August 1, 2024 20:50

fixup! use builtin incremental widening slowly when needed

e59a4d8

add tests for 44906

6af4850

also revamp findmin/findmax

d31cba6

minor cleanups, slightly better naming and hoist some merges

6025380

barucden mentioned this pull request Aug 6, 2024

lcm(::Vector{<:Rational}) gives the wrong answer. #55379

Closed

This comment was marked as off-topic.

Sign in to view

mbauman mentioned this pull request Aug 27, 2024

WIP: implement reductions as reductions JuliaStats/Statistics.jl#172

Open

fixup findmin/max/etc and add support for custom allocators

3e9e9d0

the custom allocators are needed for Sparse Arrays

This was referenced Aug 29, 2024

unify reduce/reducedim empty case behaviors #55628

Open

minimum might cause crash on 1.7 and master (observed on both Windows x86_64 and macOS x86_64) #43731

Open

mbauman added 2 commits August 29, 2024 18:34

more tests

6bfd6c1

better comment and slight cleanups

1635026

mbauman mentioned this pull request Aug 30, 2024

Document that reduce-like functions use init one or more times #53945

Open

N5N3 reviewed Aug 30, 2024

View reviewed changes

mbauman added the help wanted Indicates that a maintainer wants help on an issue or pull request label Oct 3, 2024

mbauman mentioned this pull request Nov 22, 2024

sum!, prod!, any!, and all! may silently return incorrect results #39385

Open

adienes mentioned this pull request Dec 4, 2024

fix map! undefined value exposure #56673

Open

This was referenced Dec 21, 2024

some updates for mapreducedim refactor JuliaStats/Statistics.jl#182

Draft

don't use reducedim_initarray JuliaLang/LinearAlgebra.jl#1153

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: a more principled take on dimensional reduction inits #55318

WIP: a more principled take on dimensional reduction inits #55318

mbauman commented Jul 31, 2024 •

edited

Loading

mbauman Jul 31, 2024

mbauman Aug 29, 2024

mbauman commented Jul 31, 2024 •

edited

Loading

mbauman commented Jul 31, 2024

mbauman Aug 1, 2024

mbauman Aug 1, 2024

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

N5N3 Aug 30, 2024

N5N3 Aug 30, 2024

adienes commented Dec 4, 2024

mbauman commented Dec 4, 2024

adienes commented Dec 21, 2024

WIP: a more principled take on dimensional reduction inits #55318

Are you sure you want to change the base?

WIP: a more principled take on dimensional reduction inits #55318

Conversation

mbauman commented Jul 31, 2024 • edited Loading

mbauman Jul 31, 2024

Choose a reason for hiding this comment

mbauman Aug 29, 2024

Choose a reason for hiding this comment

mbauman commented Jul 31, 2024 • edited Loading

mbauman commented Jul 31, 2024

mbauman Aug 1, 2024

Choose a reason for hiding this comment

mbauman Aug 1, 2024

Choose a reason for hiding this comment

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

N5N3 Aug 30, 2024

Choose a reason for hiding this comment

N5N3 Aug 30, 2024

Choose a reason for hiding this comment

adienes commented Dec 4, 2024

mbauman commented Dec 4, 2024

adienes commented Dec 21, 2024

mbauman commented Jul 31, 2024 •

edited

Loading

mbauman commented Jul 31, 2024 •

edited

Loading