Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add AMDGPU dialect scaffolding #4685

Merged
merged 3 commits into from
Sep 20, 2024

Conversation

oplavsic
Copy link
Contributor

This PR adds a scaffolding for defining and lowering AMD specific dialect to LLVM.

Copy link
Collaborator

@antiagainst antiagainst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! A couple of comments.

add_subdirectory(TritonAMDGPUToLLVM)
add_subdirectory(TritonAMDGPUTransforms)
add_subdirectory(Dialect)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You wanna put Dialect as the first given others may depends on it.

#include "mlir/Dialect/LLVMIR/LLVMDialect.h"
#include "mlir/Pass/Pass.h"

namespace mlir {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: namespace mlir::trition.


def ConvertAMDGPUToLLVM : Pass<"convert-amd-gpu-to-llvm", "mlir::ModuleOp"> {
let summary = "Convert AMDGPU to LLVM";
let description = [{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we put some description here?

let cppNamespace = "::mlir::triton::amdgpu";

let description = [{
AMDGPU Dialect.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let summary = "Triton AMDGPU dialect".
let description = [{
  This dialect hosts operations used for lowering patterns in the Triton AMD backend.
}];

*/


#ifndef AMDGPU_OPS
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we prefix these #ifdef in TD files with TRITON_ to avoid potential collision with other sources? Upstream has an AMDGPU dialect too..

Also for other TD files in this patch.

}
} // namespace mlir::triton::AMD

class ConvertAMDGPUToLLVM
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: can use struct given no private fields anyway.


using namespace mlir;
using namespace mlir::triton;
using namespace mlir::triton::gpu;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a fan of just using all namespaces. Can we drop all except the first two? We can consider add them if necessary later. But don't want to just enable all from the get-go.

mlir::LowerToLLVMOptions option(context);

TritonGPUToLLVMTypeConverter typeConverter(context, option);
ModuleAxisInfoAnalysis axisInfoAnalysis(mod);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't think we need this right now? Can we delete and add later when really needed?

public:
explicit AMDDialectLLVMConversionTarget(MLIRContext &ctx)
: ConversionTarget(ctx) {
addLegalDialect<LLVM::LLVMDialect>();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can pass multiple dialect into the same addLegalDialect<...> call.

addLegalDialect<mlir::scf::SCFDialect>();
addLegalDialect<triton::TritonDialect>();
addLegalDialect<triton::gpu::TritonGPUDialect>();
addIllegalDialect<triton::nvidia_gpu::TritonNvidiaGPUDialect>();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For illegal ones, maybe use markUnknownOpDynamicallyLegal to return false for all of them? This way we are a bit future proof so that we don't need to remember updating this after dialect changes.

Copy link
Collaborator

@ThomasRaoux ThomasRaoux left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what would be the level of abstraction lf this new dialect? I assume it is the same as TritonGPU?

@@ -64,6 +66,8 @@ void init_triton_amd_passes_ttgpuir(py::module &&m) {
mlir::createTritonAMDGPUStreamPipelinePass);
ADD_PASS_WRAPPER_1("add_stream_pipelinev2",
mlir::createTritonAMDGPUStreamPipelineV2Pass, int);
ADD_PASS_WRAPPER_0("add_amdgpu_to_llvm",
mlir::triton::createConvertAMDGPUToLLVMPass);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need a separate pass for AMDGPU to LLVM? Ideally this is done with extra pattern on top of TritonGPU to llvm as breaking up the conversion to llvm into multiple pass has downsides. It means we get an intermediate IR that mixed block and simt semantic and it prevents pattern matching when lowering.

@antiagainst
Copy link
Collaborator

what would be the level of abstraction lf this new dialect? I assume it is the same as TritonGPU?

Yup at TritonGPU level. I just realized that the initial commit message isn't providing enough anticipated usage of this dialect. Basically it's meant to complement TritonGPU in the lowering process for AMD side like what we have for NVIDIA ones. Example ops we are considering right now (not definitely confirmed we will have them):

  • Buffer load/store ops. like upstream amdgpu.raw_buffer_load and store. this way we can immediately generate such ops when after canonicalizing pointer calculations so that we don't need to rediscover which tt.load/tt.store to convert again when converting to llvm.
  • Instruction scheduling related ops. Idea is to introduce some anchor ops so that we can mark "regions" for later to do llvm.amdgpu.sched_barrier and llvm.amdgpu.shed_group_barrier injection at a later step after llvm conversion.
  • etc.

@ThomasRaoux
Copy link
Collaborator

what would be the level of abstraction lf this new dialect? I assume it is the same as TritonGPU?

Yup at TritonGPU level. I just realized that the initial commit message isn't providing enough anticipated usage of this dialect. Basically it's meant to complement TritonGPU in the lowering process for AMD side like what we have for NVIDIA ones. Example ops we are considering right now (not definitely confirmed we will have them):

  • Buffer load/store ops. like upstream amdgpu.raw_buffer_load and store. this way we can immediately generate such ops when after canonicalizing pointer calculations so that we don't need to rediscover which tt.load/tt.store to convert again when converting to llvm.
  • Instruction scheduling related ops. Idea is to introduce some anchor ops so that we can mark "regions" for later to do llvm.amdgpu.sched_barrier and llvm.amdgpu.shed_group_barrier injection at a later step after llvm conversion.
  • etc.

That's what I was expecting and I think it makes sense. One think to consider for the naming, on Nivida side the analogue dialect is called TritonNvidiaGPU. There is a dialect at LLVM level of abstraction called NVGPU. So I guess my point is maybe it should be called TritonAMDGPU, although it does feel a bit verbose.

@antiagainst
Copy link
Collaborator

Yeah that makes sense--we want to be consistent regarding naming there. Better to have that done at the beginning.

Copy link
Collaborator

@antiagainst antiagainst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Just one nit regarding documentation.

let cppNamespace = "::mlir::triton::amdgpu";

let description = [{
TritonAMDGPU Dialect is used to support AMD specific instructions on TritonGPU level abstraction.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.. hosts AMD specific ops at TritonGPU abstraction level.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Thanks for the review!

@antiagainst antiagainst marked this pull request as ready for review September 20, 2024 05:11
Copy link
Collaborator

@ThomasRaoux ThomasRaoux left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
Note that it will need to be added to RegisterTritonDialects.h to work with our tools

@antiagainst antiagainst merged commit d6a11a4 into triton-lang:main Sep 20, 2024
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants