[BE] Accumulator init optimization #4680

pawelszczerbuk · 2024-09-09T16:48:56Z

Adding a transformation pass that skips filling the accumulator with zero value if the HW supports accumulator scale or init flag. In such case flag value is created and maintained, and passed to the MMA op indicating if accumulator should be taken into an account when calculating the dot product.
The pass is intended to be generic enough to be reusable between different HW platforms, therefore it is not placed in the Nvidia specific folder, even though it is supporting only Hopper MMA for the moment.

ThomasRaoux

Nice! Should this be moved to nvidia_gpu pass as it is nvidia specific?

ThomasRaoux · 2024-09-09T16:55:53Z

include/triton/Dialect/TritonGPU/Transforms/Passes.td

@@ -169,4 +169,14 @@ def TritonGPUCombineTensorSelectAndIf: Pass<"tritongpu-combine-tensor-select-and
                           "mlir::triton::TritonDialect"];
 }

+def TritonGPUOptimizeAccumulatorInit: Pass<"tritongpu-optimize-accumulator-init", "mlir::ModuleOp"> {


should this go in triton/Dialect/TritonNvidiaGPU/Transforms/Passes.td?

include/triton/Dialect/Triton/IR/Traits.h

pawelszczerbuk · 2024-09-09T20:54:33Z

Nice! Should this be moved to nvidia_gpu pass as it is nvidia specific?

I was trying to keep the code generic enough with the helpers so any HW that supports acc scale or acc init flag could be plugged in easily, that's why I kept it in TritonGPU. I would argue that this is more generic than our Matmul pipeliner, that lives in TritonGPU :) But I agree that we don't have anything else right now that can benefit from it.

ThomasRaoux · 2024-09-09T20:58:48Z

Nice! Should this be moved to nvidia_gpu pass as it is nvidia specific?

I was trying to keep the code generic enough with the helpers so any HW that supports acc scale or acc init flag could be plugged in easily, that's why I kept it in TritonGPU. I would argue that this is more generic than our Matmul pipeliner, that lives in TritonGPU :) But I agree that we don't have anything else right now that can benefit from it.

fair enough

ThomasRaoux

LGTM

pawelszczerbuk added 6 commits August 26, 2024 12:54

Base for perf data collection

be46ef1

First hack to get AccInit optimization to work

668bf99

Merge branch 'main' into pawel/acc_init_opt

53624d3

New pass ready, working with Hopper pipelining

4c5d950

Updating lowering lit tests

b0d7fe1

Reverting tutorial change

445a16e

pawelszczerbuk requested a review from ThomasRaoux September 9, 2024 16:48

pawelszczerbuk requested a review from ptillet as a code owner September 9, 2024 16:48

ThomasRaoux reviewed Sep 9, 2024

View reviewed changes

ThomasRaoux approved these changes Sep 9, 2024

View reviewed changes

pawelszczerbuk added 2 commits September 9, 2024 14:21

Updating lit tests

e774fe9

Handle the case where the WGMMAOp has no C operand set

4153c5e

pawelszczerbuk merged commit a0c1bc9 into triton-lang:main Sep 10, 2024
7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BE] Accumulator init optimization #4680

[BE] Accumulator init optimization #4680

pawelszczerbuk commented Sep 9, 2024

ThomasRaoux left a comment

ThomasRaoux Sep 9, 2024

pawelszczerbuk commented Sep 9, 2024

ThomasRaoux commented Sep 9, 2024

ThomasRaoux left a comment

[BE] Accumulator init optimization #4680

[BE] Accumulator init optimization #4680

Conversation

pawelszczerbuk commented Sep 9, 2024

ThomasRaoux left a comment

Choose a reason for hiding this comment

ThomasRaoux Sep 9, 2024

Choose a reason for hiding this comment

pawelszczerbuk commented Sep 9, 2024

ThomasRaoux commented Sep 9, 2024

ThomasRaoux left a comment

Choose a reason for hiding this comment