New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Add AMDGPU dialect scaffolding #4685

Merged

antiagainst merged 3 commits into triton-lang:main from oplavsic:amd_dialect_load_wip

Sep 20, 2024

Contributor

oplavsic commented Sep 10, 2024

This PR adds a scaffolding for defining and lowering AMD specific dialect to LLVM.

antiagainst requested changes

View reviewed changes

Collaborator

antiagainst left a comment

Thanks! A couple of comments.

third_party/amd/include/CMakeLists.txt Outdated

               add_subdirectory(TritonAMDGPUToLLVM)
               add_subdirectory(TritonAMDGPUTransforms)
+              add_subdirectory(Dialect)

Collaborator

antiagainst Sep 10, 2024

You wanna put Dialect as the first given others may depends on it.

third_party/amd/include/AMDGPUToLLVM/Passes.h Outdated

+              #include "mlir/Dialect/LLVMIR/LLVMDialect.h"
+              #include "mlir/Pass/Pass.h"
+              namespace mlir {

Collaborator

antiagainst Sep 10, 2024

Nit: namespace mlir::trition.

third_party/amd/include/AMDGPUToLLVM/Passes.td Outdated

+              def ConvertAMDGPUToLLVM : Pass<"convert-amd-gpu-to-llvm", "mlir::ModuleOp"> {
+                  let summary = "Convert AMDGPU to LLVM";
+                  let description = [{

Collaborator

antiagainst Sep 10, 2024

Can we put some description here?

third_party/amd/include/Dialect/AMDGPU/IR/AMDGPUDialect.td Outdated

+                let cppNamespace = "::mlir::triton::amdgpu";
+                let description = [{
+                  AMDGPU Dialect.

Collaborator

antiagainst Sep 10, 2024

let summary = "Triton AMDGPU dialect".
let description = [{
  This dialect hosts operations used for lowering patterns in the Triton AMD backend.
}];

third_party/amd/include/Dialect/AMDGPU/IR/AMDGPUOps.td Outdated

		*/


		#ifndef AMDGPU_OPS

Collaborator

antiagainst Sep 10, 2024

Can we prefix these #ifdef in TD files with TRITON_ to avoid potential collision with other sources? Upstream has an AMDGPU dialect too..

Also for other TD files in this patch.

third_party/amd/lib/AMDGPUToLLVM/AMDGPUToLLVMPass.cpp Outdated

+              }
+              } // namespace mlir::triton::AMD
+              class ConvertAMDGPUToLLVM

Collaborator

antiagainst Sep 10, 2024

Nit: can use struct given no private fields anyway.

third_party/amd/lib/AMDGPUToLLVM/AMDGPUToLLVMPass.cpp Outdated

+              using namespace mlir;
+              using namespace mlir::triton;
+              using namespace mlir::triton::gpu;

Collaborator

antiagainst Sep 10, 2024

Not a fan of just using all namespaces. Can we drop all except the first two? We can consider add them if necessary later. But don't want to just enable all from the get-go.

third_party/amd/lib/AMDGPUToLLVM/AMDGPUToLLVMPass.cpp Outdated

+                  mlir::LowerToLLVMOptions option(context);
+                  TritonGPUToLLVMTypeConverter typeConverter(context, option);
+                  ModuleAxisInfoAnalysis axisInfoAnalysis(mod);

Collaborator

antiagainst Sep 10, 2024

Don't think we need this right now? Can we delete and add later when really needed?

third_party/amd/lib/AMDGPUToLLVM/AMDGPUToLLVMPass.cpp Outdated

+              public:
+                explicit AMDDialectLLVMConversionTarget(MLIRContext &ctx)
+                    : ConversionTarget(ctx) {
+                  addLegalDialect<LLVM::LLVMDialect>();

Collaborator

antiagainst Sep 10, 2024

You can pass multiple dialect into the same addLegalDialect<...> call.

third_party/amd/lib/AMDGPUToLLVM/AMDGPUToLLVMPass.cpp Outdated

+                  addLegalDialect<mlir::scf::SCFDialect>();
+                  addLegalDialect<triton::TritonDialect>();
+                  addLegalDialect<triton::gpu::TritonGPUDialect>();
+                  addIllegalDialect<triton::nvidia_gpu::TritonNvidiaGPUDialect>();

Collaborator

antiagainst Sep 10, 2024

For illegal ones, maybe use markUnknownOpDynamicallyLegal to return false for all of them? This way we are a bit future proof so that we don't need to remember updating this after dialect changes.

ThomasRaoux reviewed

View reviewed changes

Collaborator

ThomasRaoux left a comment

what would be the level of abstraction lf this new dialect? I assume it is the same as TritonGPU?

third_party/amd/python/triton_amd.cc Outdated

@@ @@ -64,6 +66,8 @@ void init_triton_amd_passes_ttgpuir(py::module &&m) { @@
                                    mlir::createTritonAMDGPUStreamPipelinePass);
                 ADD_PASS_WRAPPER_1("add_stream_pipelinev2",
                                    mlir::createTritonAMDGPUStreamPipelineV2Pass, int);
+                ADD_PASS_WRAPPER_0("add_amdgpu_to_llvm",
+                                   mlir::triton::createConvertAMDGPUToLLVMPass);

Collaborator

ThomasRaoux Sep 11, 2024

why do we need a separate pass for AMDGPU to LLVM? Ideally this is done with extra pattern on top of TritonGPU to llvm as breaking up the conversion to llvm into multiple pass has downsides. It means we get an intermediate IR that mixed block and simt semantic and it prevents pattern matching when lowering.

Collaborator

antiagainst commented Sep 11, 2024

what would be the level of abstraction lf this new dialect? I assume it is the same as TritonGPU?

Yup at TritonGPU level. I just realized that the initial commit message isn't providing enough anticipated usage of this dialect. Basically it's meant to complement TritonGPU in the lowering process for AMD side like what we have for NVIDIA ones. Example ops we are considering right now (not definitely confirmed we will have them):

Buffer load/store ops. like upstream amdgpu.raw_buffer_load and store. this way we can immediately generate such ops when after canonicalizing pointer calculations so that we don't need to rediscover which tt.load/tt.store to convert again when converting to llvm.
Instruction scheduling related ops. Idea is to introduce some anchor ops so that we can mark "regions" for later to do llvm.amdgpu.sched_barrier and llvm.amdgpu.shed_group_barrier injection at a later step after llvm conversion.
etc.

Collaborator

ThomasRaoux commented Sep 11, 2024

what would be the level of abstraction lf this new dialect? I assume it is the same as TritonGPU?

Yup at TritonGPU level. I just realized that the initial commit message isn't providing enough anticipated usage of this dialect. Basically it's meant to complement TritonGPU in the lowering process for AMD side like what we have for NVIDIA ones. Example ops we are considering right now (not definitely confirmed we will have them):

Buffer load/store ops. like upstream amdgpu.raw_buffer_load and store. this way we can immediately generate such ops when after canonicalizing pointer calculations so that we don't need to rediscover which tt.load/tt.store to convert again when converting to llvm.

Instruction scheduling related ops. Idea is to introduce some anchor ops so that we can mark "regions" for later to do llvm.amdgpu.sched_barrier and llvm.amdgpu.shed_group_barrier injection at a later step after llvm conversion.

etc.

That's what I was expecting and I think it makes sense. One think to consider for the naming, on Nivida side the analogue dialect is called TritonNvidiaGPU. There is a dialect at LLVM level of abstraction called NVGPU. So I guess my point is maybe it should be called TritonAMDGPU, although it does feel a bit verbose.

Collaborator

antiagainst commented Sep 11, 2024

Yeah that makes sense--we want to be consistent regarding naming there. Better to have that done at the beginning.


          Add AMDGPU dialect scaffolding

29e8324

oplavsic force-pushed the amd_dialect_load_wip branch from e4084c5 to c622c1e Compare

September 19, 2024 16:08

antiagainst approved these changes

View reviewed changes

Collaborator

antiagainst left a comment

LGTM! Just one nit regarding documentation.

third_party/amd/include/Dialect/TritonAMDGPU/IR/TritonAMDGPUDialect.td Outdated

+                let cppNamespace = "::mlir::triton::amdgpu";
+                let description = [{
+                  TritonAMDGPU Dialect is used to support AMD specific instructions on TritonGPU level abstraction.

Collaborator

antiagainst Sep 20, 2024

.. hosts AMD specific ops at TritonGPU abstraction level.

Contributor Author

oplavsic Sep 20, 2024

Done. Thanks for the review!

antiagainst marked this pull request as ready for review

September 20, 2024 05:11

antiagainst requested a review from zhanglx13 as a code owner

September 20, 2024 05:11


          Address review comments

c1d1a3b

oplavsic force-pushed the amd_dialect_load_wip branch from c622c1e to c1d1a3b Compare

September 20, 2024 13:12

antiagainst mentioned this pull request

Added instruction scheduling op #4770

Draft

ThomasRaoux approved these changes

View reviewed changes

Collaborator

ThomasRaoux left a comment

LGTM
Note that it will need to be added to RegisterTritonDialects.h to work with our tools


          Add to RegisterTritonDialects

b971654

antiagainst requested a review from ptillet as a code owner

September 20, 2024 17:01

antiagainst merged commit d6a11a4 into triton-lang:main

7 checks passed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

antiagainst antiagainst approved these changes

ThomasRaoux ThomasRaoux approved these changes

zhanglx13 Awaiting requested review from zhanglx13 zhanglx13 is a code owner

ptillet Awaiting requested review from ptillet ptillet is a code owner

Labels

None yet