Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adding reduce to Ops.jl #840

Merged
merged 7 commits into from
Mar 8, 2025
Merged

adding reduce to Ops.jl #840

merged 7 commits into from
Mar 8, 2025

Conversation

tharittk
Copy link
Contributor

@tharittk tharittk commented Mar 3, 2025

I deliberately did not test on non-commutative operators—it will fail when tested against Julia's reduce. I read the discussion on the internet that it is supposed to be so, i.e., reduce should take a commutative operator; otherwise, the result will be non-deterministic, especially in a parallel system.

Copy link
Member

@giordano giordano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I deliberately did not test on non-commutative operators—it will fail when tested against Julia's reduce.

Does that mean that the reduce method introduced here has different semantics than Julia's reduce? That'd be unideal, asi it can lead to hard-to-track bugs when people write generic code and Reactant suddenly changes their meaning (similar for example to #755)

@mofeing
Copy link
Collaborator

mofeing commented Mar 3, 2025

one thing to note is that methods in Ops should replicate the semantics of StableHLO (and possibly other MLIR dialects), not Julia.

the adaptation from StableHLO to Julia semantics is done in the method specializations outside the Ops module.

@giordano
Copy link
Member

giordano commented Mar 3, 2025

one thing to note is that methods in Ops should replicate the semantics of StableHLO (and possibly other MLIR dialects), not Julia.

Ah, yes, that's a good point. This is not extending Base.reduce, so that's fine (although the same name is slightly confusing 😅).

@mofeing
Copy link
Collaborator

mofeing commented Mar 3, 2025

... (although the same name is slightly confusing 😅).

yep 😅, but it's because it wraps the stablehlo.reduce op. as a rule of thumb, you should not import Ops but call them explicitly like Ops.reduce to avoid confusion.

@@ -2313,4 +2313,50 @@ Produces a [`Reactant.MLIR.Dialects.sdy.sharding_constraint`](@ref) operation wi
end
end

@noinline function reduce(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you follow up with a PR to update the mapreduce impl to call this function

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure I'll do that

@avik-pal
Copy link
Collaborator

avik-pal commented Mar 4, 2025

the cuda reduce tests are failing

@tharittk
Copy link
Contributor Author

tharittk commented Mar 4, 2025

Based on my initial investigation, it seems to be about how the init_values is handled. In case of CPU - via Reactant.set_default_backend("cpu"), there is no issue.

But when using GPU, if the init_values is set to be 1 for * and 0 for +, for example, it will be OK. Other than that, it looks like the GPU applies init_values multiple times.

For example, if we start with
A = [1 3; 2 4;;; 5 7; 6 8;;; 9 11; 10 12]

with init_values = 2, and dim = [3] then
we have [(15+2) (21+2); (18+2) (24+2)] = [17 23; 20 26] -- both CPU and GPU version agree

with init_values = 2, and dim = [1, 3] then
We should have [(33 + 2) (45+2)] -- init_value only applied once (this is what Julia's reduce() gives also)
but with stablehlo + GPU, we get [(17+20) (23+26)] -- effectively init_values is applied twice

I am not sure if this is a sematic issue or something else.

What I think happens is stablehlo broadcasts the init_values at the beginning and do reduce while Julia's reduce does the reduce and then broadcasts the init_values to whatever size is after reduce at the end

@tharittk
Copy link
Contributor Author

tharittk commented Mar 4, 2025

Ok I was not reading the specification carefully. Here what it says

Semantics
Applies a reduction function body to inputs and init_values along the dimensions and produces results tensors.
The order of reductions is implementation-defined, which means that body and init_values must form a monoid to
guarantee that the operation produces the same results for all inputs on all implementations. However, this condition
doesn't hold for many popular reductions. E.g. floating-point addition for body and zero for init_values don't actually form
a monoid because floating-point addition is not associative.

So it is the word monoid. I honestly google that word and, to my understanding, it means the operation must be associative and the init_values has to be the identity of that operation to guarantee the same result across all implementation.

@giordano
Copy link
Member

giordano commented Mar 4, 2025

Here what it says

For the benefit of readers, this is the source: https://openxla.org/stablehlo/spec#semantics_71

@avik-pal
Copy link
Collaborator

avik-pal commented Mar 5, 2025

Run JuliaFormatter on the code to fix the formatter ci

Copy link

codecov bot commented Mar 5, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 39.82%. Comparing base (b6ffc96) to head (a18e826).
Report is 538 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff             @@
##             main     #840       +/-   ##
===========================================
+ Coverage   21.66%   39.82%   +18.15%     
===========================================
  Files          46      104       +58     
  Lines        8048    16919     +8871     
===========================================
+ Hits         1744     6738     +4994     
- Misses       6304    10181     +3877     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@Pangoraw
Copy link
Collaborator

Pangoraw commented Mar 6, 2025

Could you add a docstring explaining the difference with Julia's reduce?

@tharittk
Copy link
Contributor Author

tharittk commented Mar 7, 2025

hmm.. the latest update from upstream branch seems to break this, specifically for macOS. Anyone has any suggestion?

@avik-pal avik-pal merged commit 5f9d523 into EnzymeAD:main Mar 8, 2025
37 of 40 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants