Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BE] Accumulator init optimization #4680

Merged
merged 8 commits into from
Sep 10, 2024

Conversation

pawelszczerbuk
Copy link
Contributor

Adding a transformation pass that skips filling the accumulator with zero value if the HW supports accumulator scale or init flag. In such case flag value is created and maintained, and passed to the MMA op indicating if accumulator should be taken into an account when calculating the dot product.
The pass is intended to be generic enough to be reusable between different HW platforms, therefore it is not placed in the Nvidia specific folder, even though it is supporting only Hopper MMA for the moment.

Copy link
Collaborator

@ThomasRaoux ThomasRaoux left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Should this be moved to nvidia_gpu pass as it is nvidia specific?

@@ -169,4 +169,14 @@ def TritonGPUCombineTensorSelectAndIf: Pass<"tritongpu-combine-tensor-select-and
"mlir::triton::TritonDialect"];
}

def TritonGPUOptimizeAccumulatorInit: Pass<"tritongpu-optimize-accumulator-init", "mlir::ModuleOp"> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this go in triton/Dialect/TritonNvidiaGPU/Transforms/Passes.td?

include/triton/Dialect/Triton/IR/Traits.h Show resolved Hide resolved
@pawelszczerbuk
Copy link
Contributor Author

Nice! Should this be moved to nvidia_gpu pass as it is nvidia specific?

I was trying to keep the code generic enough with the helpers so any HW that supports acc scale or acc init flag could be plugged in easily, that's why I kept it in TritonGPU. I would argue that this is more generic than our Matmul pipeliner, that lives in TritonGPU :) But I agree that we don't have anything else right now that can benefit from it.

@ThomasRaoux
Copy link
Collaborator

Nice! Should this be moved to nvidia_gpu pass as it is nvidia specific?

I was trying to keep the code generic enough with the helpers so any HW that supports acc scale or acc init flag could be plugged in easily, that's why I kept it in TritonGPU. I would argue that this is more generic than our Matmul pipeliner, that lives in TritonGPU :) But I agree that we don't have anything else right now that can benefit from it.

fair enough

Copy link
Collaborator

@ThomasRaoux ThomasRaoux left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@pawelszczerbuk pawelszczerbuk merged commit a0c1bc9 into triton-lang:main Sep 10, 2024
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants