-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BE] Accumulator init optimization #4680
[BE] Accumulator init optimization #4680
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Should this be moved to nvidia_gpu pass as it is nvidia specific?
@@ -169,4 +169,14 @@ def TritonGPUCombineTensorSelectAndIf: Pass<"tritongpu-combine-tensor-select-and | |||
"mlir::triton::TritonDialect"]; | |||
} | |||
|
|||
def TritonGPUOptimizeAccumulatorInit: Pass<"tritongpu-optimize-accumulator-init", "mlir::ModuleOp"> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this go in triton/Dialect/TritonNvidiaGPU/Transforms/Passes.td
?
I was trying to keep the code generic enough with the helpers so any HW that supports acc scale or acc init flag could be plugged in easily, that's why I kept it in TritonGPU. I would argue that this is more generic than our Matmul pipeliner, that lives in TritonGPU :) But I agree that we don't have anything else right now that can benefit from it. |
fair enough |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Adding a transformation pass that skips filling the accumulator with zero value if the HW supports accumulator scale or init flag. In such case flag value is created and maintained, and passed to the MMA op indicating if accumulator should be taken into an account when calculating the dot product.
The pass is intended to be generic enough to be reusable between different HW platforms, therefore it is not placed in the Nvidia specific folder, even though it is supporting only Hopper MMA for the moment.