Document that reduce-like functions use `init` one or more times #53945

mbauman · 2024-04-03T22:10:38Z

OK, here's take two, building on the work that was done in #53871. We preserve the neutral element requirement but guarantee that init is used to start all evaluation chains.

The news summarizes as succinctly as I can:

init is now guaranteed to be used for non-empty collections one or more times. This ensures that all calls
to the reducing function op have one argument that is either init or the result from a
previous op evaluation. Previously init was explicitly allowed to be omitted from the reduction entirely.

And the docs for mapreduce say a bit more:

If provided, init is the return value for empty collections and is used one or more times as
an argument to op for non-empty collections. The init value is not transformed by the function f.
Using init ensures that all calls to op have one argument that is either init
or the result from a previous op evaluation, and the ordering of these arguments is
unspecified. As it may appear in the reduction one or more times, it must be a neutral element for op that
does not change the result by being used more than once. It is generally an error to call mapreduce
with empty collections without specifying an init value, but in unambiguous cases an
identity value for op may be returned; see Base.reduce_empty for more details.

Closes #49042.

Co-authored-by: mikmoore <[email protected]>

Co-authored-by: Steven G. Johnson <[email protected]>

and add more clarifications

base/reduce.jl

jariji · 2024-04-03T22:42:16Z

Do we want to specify whether it's op(init, _) or op(_, init)?

base/reduce.jl

mbauman · 2024-04-03T22:49:59Z

Do we want to specify whether it's op(init, _) or op(_, init)?

I don't think there's a meaningful reason to do so, because subsequent steps might be op(op(...), _) or op(_, op(...)). That's the interesting thing in my head — that you ~~always have one argument from the collection and one result_or_init. Since the location of the result isn't defined, I don't think the location of the init should be, either.~~

Edit: Oh duh, that's not true. You sometimes have two results. Duh. But my point is that you never have two elements, and that which one may be a value from the collection is unspecified.

But now I need to clean up that language some more. Shucks. 🤦

NEWS.md

base/reduce.jl

jonas-schulze · 2024-04-04T08:28:25Z

Do we want to specify whether it's op(init, _) or op(_, init)?

I think this is a fair point, especially since fold[lr] immediately refer to reduce. Either option is fine by me. Using the same order for all three (six including map*) makes switching seamless.

It most cases, both arguments will likely have the same type, i.e. op(::T, ::T) where T. But surely, there will be some cases where this is not the case, such that it is necessary to specify whether it is op(acc, val) or the other way around.

I think I had a non-associative example in mind, for which I should have used foldl/foldr instead. We do want to allow op(acc1, acc2).

Seelengrab · 2024-04-04T10:04:12Z

This docstring feels awfully long, especially when viewed in help> or as a function hint in an editor - maybe it's better to have this behavior recorded in the manual, and refer to that section in the docstring?

MasonProtter

Looks great, thank you for taking this on @mbauman. I just have one remaining point of disagreement about assuming purity of f and then I'm fully on board.

base/reduce.jl

StefanKarpinski · 2024-05-01T21:31:51Z

I'm a bit late to the party, but after consideration and discussion with @mbauman, I'm on board with this change. I think the guarantee can be expressed more clearly this way:

When init is not provided, one or more op calls will take collection values as both arguments. When init is provided, collection values will never occur as left arguments to op, only as right arguments; the left argument to op will be init for one or more op calls, or the result of a previous call to op otherwise.

I agree that the wording of the new docs is a bit verbose, but we can always worry about that after the functionality is nailed down.

StefanKarpinski · 2024-05-02T14:05:04Z

The argument ordering is very hard to express without explicitly invoking an op call tree. Perhaps that's the best way to write it anyway. Something like

The collection values are guaranteed to appear as leaves in the tree of op calls in order, but that does not guarantee that those op calls are evaluated in that order. For example, the left and right halves of an array may be reduced concurrently. When the results of those two reductions are combined the left and right reduced values will respectively be the left and right arguments to the final op call.

…itum

mbauman · 2024-07-24T15:15:50Z

I don't like requiring that init needs to be a left-argument to op because that means that foldr would not be a valid implementation for reduce. But otherwise I agree. I'm seeing if I can get the description a little more clear with fresh eyes here.

jariji · 2024-07-24T19:16:02Z

Is "init is the right argument of op in foldr" documented?

mbauman · 2024-07-24T19:18:23Z

As an example, yes.

Try to make each paragraph here "punchier" and more pointed: * First paragraph more specifically describes the general behavior * Second paragraph is about `init`, clarify what it means for `op` args * Third paragraph is about `op`, but point to extended help * Fourth paragraph is about empty collections The extended help now reads clearer with a bulleted list and explicitly mentions argument ordering... and is also a bit more careful about floating point associativity (or the lack thereof) in mapreduce

mbauman · 2024-07-24T20:46:04Z

OK, I've pushed a new re-wording. I find this more clear, personally.

We seem have a very solid consensus that "reduce-like functions use init one or more times" so I've removed the RFC from the title. I think I've faithfully represented that idea in the docs with this latest push.

Of course comments on the wordings are still very welcome, but I am much happier with the state of this now.

and try to fix ≈ docs

mikmoore · 2024-07-25T16:35:07Z

It makes me a tad anxious that we don't guarantee whether the init is a first or second argument to op. Could there be a situation where this is an issue, where we might need distinct left and right neutral elements? Does a neutral element always commute for an associative operation? This might be a trivial result from group theory, but I don't know the answer.

nsajko · 2024-07-25T16:48:41Z

Does a neutral element always commute for an associative operation?

No: https://en.wikipedia.org/wiki/Semigroup#Identity_and_zero

nsajko · 2024-07-25T16:53:22Z

A two-sided identity (or just identity) is an element that is both a left and right identity. Semigroups with a two-sided identity are called monoids. A semigroup may have at most one two-sided identity. If a semigroup has a two-sided identity, then the two-sided identity is the only one-sided identity in the semigroup. If a semigroup has both a left identity and a right identity, then it has a two-sided identity (which is therefore the unique one-sided identity).

So a monoid is required, not just associativity.

mbauman · 2024-07-25T16:54:07Z

I have two reasons why that's not a major concern in my view. One is that there are three "types" of arguments that get passed to op:

init itself
values from itr — which necessarily could be either left- or right-args (but not both when init is provided)
return values from previous op calls — which are also necessarily left- or right-args (or both)

Because the init value is also the return value for empty collections, it should be of a similar "type" (yes, I mean that two ways: there's the literal Type, but it also applies more generally as a softer "interpretation" or "category of thing") as op's return values. And op necessarily needs to be able to accommodate previous return values in either slot.

The second is more simplistic, but it really seems like foldr should be a valid implementation for reduce, and foldr(=>, [1,2,3], init=0) is (and should be) 1 => (2 => (3 => 0)).

mbauman · 2024-07-25T17:17:49Z

Perhaps more succinctly, just look at foldl. With init, it will pass all its itr values as right-args. All the left args (except one) are results from previous operations. And init is that final left-arg. Flip the associativity with -r and the argument orderings flip. So init's placement flips. It makes sense that an arbitrary associativity would grant this leeway.

If you care about argument orderings between init and an itr value, you probably care about the argument orderings between an op result and an itr value — and so then you care about the associativity. And you should use foldl or foldr.

stevengj · 2024-07-25T20:34:36Z

When the operation is +, should we check iszero(init) for backward compatibility and print a deprecation warning if it is false (calling mapfoldl as a fallback)?

mbauman · 2024-07-25T21:05:53Z

Well the previous documentation explicitly says that it may or may not be used... but it'd probably be a good idea to get an idea of how many folks were relying on the old implementation. I say we first try the "full pairwise" in #52397 so we can run PkgEval there to gather the data. A deprecation path would be kind but would probably have to be fairly finely scoped (the generic iszero can throw method errors).

mikmoore · 2024-08-30T14:51:10Z

One behavior I'd like to preserve is

julia> sum(fill(-0.0, 0))
0.0

julia> sum(fill(-0.0, 3))
-0.0

It seems that the way to get this is to make sum on IEEEFloat use init=-0.0 (the IEEEFloat additive identity) but allow (the option of) a distinct return value of +0.0 for the isempty case. I hate that this makes the implementation more complicated but I don't see a great way around it. Thanks to mbauman for clarifying the missing-init case below. It was probably made clear a while ago but I came back to this after seeing the discussion on 55318 today. If I'd re-read more carefully I might have noticed that probably nobody was proposing we provide default values for init.

I would be somewhat unhappy if a sum of -0.0s yielded +0.0 (sum(x) would differ from +(x...) in a not-just-reassociation way), and I expect the change would break things (both existing and yet-unwritten) in at least a few places, but I know most people would be unhappy if an empty sum yielded -0.0 (and this would almost certainly break things in many places).

(+0.0 vs -0.0 might seem pretty inconsequential until you write inv(sum(x)))

mbauman · 2024-08-30T15:50:03Z

I've become completely radicalized against initial value guesswork when it's not specified, and the proposed behavior change here is only about what to do when init is specified.

Guessing at init is simply so bug-prone. I think the only sensible answer here is ensuring we always do what the change in #55318 does for dimensional reductions. That is:

0 values:

Have init? Return it. Otherwise:

Return mapreduce_empty (usually an error, sometimes an unambiguous value like 0.0)

1 value (a):

Have init? Return op(init, f(a)). Otherwise:

Return mapreduce_first (typically f(a))

2+ values (a1, a2):

Have init? Start every chain with op(op(init, f(a1)), f(a2)) and continue. Otherwise:

Start every chain with op(f(a1), f(a2)) and continue

In other words, neither 0.0 nor -0.0 in your above cases come from a divined-out init element that wasn't passed.

mbauman and others added 8 commits March 26, 2024 18:30

Add more clarity to init's role in reduce and mapreduce

f36ffbb

fix too many dots......

32e762b

Co-authored-by: mikmoore <[email protected]>

Incorporate feedback; define left-most position for init

549ceac

Apply suggestions from code review

19b76d4

Co-authored-by: Steven G. Johnson <[email protected]>

Add NEWS.md

793a4e0

typo fix

7859c7a

Revert back to the neutral element init

a5288b6

and add more clarifications

update NEWS

0b09dcd

mbauman mentioned this pull request Apr 3, 2024

Change init's role in reduce-like functions: remove "neutral element" restriction and guarantee its use #53871

Closed

mbauman commented Apr 3, 2024

View reviewed changes

base/reduce.jl Outdated Show resolved Hide resolved

minor wording

d2a1fbb

mbauman added docs This change adds or pertains to documentation fold sum, maximum, reduce, foldl, etc. labels Apr 3, 2024

mbauman commented Apr 3, 2024

View reviewed changes

base/reduce.jl Outdated Show resolved Hide resolved

mbauman and others added 2 commits April 3, 2024 18:24

more minor wording

a1963f3

Propagate changes to other reduce family members

653f33b

mbauman requested review from stevengj and MasonProtter April 3, 2024 22:33

mbauman commented Apr 3, 2024

View reviewed changes

base/reduce.jl Outdated Show resolved Hide resolved

base/reduce.jl Outdated Show resolved Hide resolved

whitespace

c1d9110

jariji reviewed Apr 3, 2024

View reviewed changes

base/reduce.jl Outdated Show resolved Hide resolved

mbauman commented Apr 3, 2024

View reviewed changes

NEWS.md Outdated Show resolved Hide resolved

base/reduce.jl Outdated Show resolved Hide resolved

base/reduce.jl Outdated Show resolved Hide resolved

fixup wordings around the arguments to op

05c9e7a

stevengj reviewed Apr 3, 2024

View reviewed changes

base/reduce.jl Outdated Show resolved Hide resolved

MasonProtter suggested changes Apr 5, 2024

View reviewed changes

base/reduce.jl Outdated Show resolved Hide resolved

use extended help for (map)?reduce, propagate language to sum et al

28669d3

Merge remote-tracking branch 'origin/master' into mb/the-bourne-ultin…

aa6bb2e

…itum

stevengj mentioned this pull request Jul 24, 2024

use pairwise order for mapreduce on arbitrary iterators #52397

Draft

mbauman changed the title ~~RFC: require reduce-like functions to use init one or more times~~ Document that reduce-like functions use init one or more times Jul 24, 2024

restore the note on re-use of f's results

f48a2c2

mbauman mentioned this pull request Jul 24, 2024

Clarify map/mapreduce's documentation with respect to memoization #55238

Open

mbauman added 2 commits July 24, 2024 17:25

remove ref to non-public Base.reduce_empty

141c09c

and try to fix ≈ docs

fix typo in reduce @ref

15aa251

mbauman mentioned this pull request Jul 31, 2024

WIP: a more principled take on dimensional reduction inits #55318

Open

mbauman mentioned this pull request Aug 28, 2024

Add init kwarg to mean/median to match other reductions in Base JuliaStats/Statistics.jl#105

Open

adienes mentioned this pull request Sep 11, 2024

Clarify the possible uses of the init keyword in minimum, maximum and extrema #44819

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Document that reduce-like functions use `init` one or more times #53945

Document that reduce-like functions use `init` one or more times #53945

mbauman commented Apr 3, 2024 •

edited

Loading

jariji commented Apr 3, 2024

mbauman commented Apr 3, 2024 •

edited

Loading

jonas-schulze commented Apr 4, 2024 •

edited

Loading

Seelengrab commented Apr 4, 2024

MasonProtter left a comment

StefanKarpinski commented May 1, 2024 •

edited

Loading

StefanKarpinski commented May 2, 2024

mbauman commented Jul 24, 2024

jariji commented Jul 24, 2024

mbauman commented Jul 24, 2024

mbauman commented Jul 24, 2024

mikmoore commented Jul 25, 2024

nsajko commented Jul 25, 2024

nsajko commented Jul 25, 2024

mbauman commented Jul 25, 2024 •

edited

Loading

mbauman commented Jul 25, 2024 •

edited

Loading

stevengj commented Jul 25, 2024 •

edited

Loading

mbauman commented Jul 25, 2024

mikmoore commented Aug 30, 2024 •

edited

Loading

mbauman commented Aug 30, 2024 •

edited

Loading

Document that reduce-like functions use init one or more times #53945

Are you sure you want to change the base?

Document that reduce-like functions use init one or more times #53945

Conversation

mbauman commented Apr 3, 2024 • edited Loading

jariji commented Apr 3, 2024

mbauman commented Apr 3, 2024 • edited Loading

jonas-schulze commented Apr 4, 2024 • edited Loading

Seelengrab commented Apr 4, 2024

MasonProtter left a comment

Choose a reason for hiding this comment

StefanKarpinski commented May 1, 2024 • edited Loading

StefanKarpinski commented May 2, 2024

mbauman commented Jul 24, 2024

jariji commented Jul 24, 2024

mbauman commented Jul 24, 2024

mbauman commented Jul 24, 2024

mikmoore commented Jul 25, 2024

nsajko commented Jul 25, 2024

nsajko commented Jul 25, 2024

mbauman commented Jul 25, 2024 • edited Loading

mbauman commented Jul 25, 2024 • edited Loading

stevengj commented Jul 25, 2024 • edited Loading

mbauman commented Jul 25, 2024

mikmoore commented Aug 30, 2024 • edited Loading

mbauman commented Aug 30, 2024 • edited Loading

Document that reduce-like functions use `init` one or more times #53945

Document that reduce-like functions use `init` one or more times #53945

mbauman commented Apr 3, 2024 •

edited

Loading

mbauman commented Apr 3, 2024 •

edited

Loading

jonas-schulze commented Apr 4, 2024 •

edited

Loading

StefanKarpinski commented May 1, 2024 •

edited

Loading

mbauman commented Jul 25, 2024 •

edited

Loading

mbauman commented Jul 25, 2024 •

edited

Loading

stevengj commented Jul 25, 2024 •

edited

Loading

mikmoore commented Aug 30, 2024 •

edited

Loading

mbauman commented Aug 30, 2024 •

edited

Loading