Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CK-Tile Grouped GEMM refactor and post PR fixes #1756

Merged
merged 29 commits into from
Jan 21, 2025

Conversation

mozga-amd
Copy link
Contributor

This pull-request contains changes as following:

  • Standardizing the function naming,
  • Moving the get_stride function, etc., to a different location,
  • Generalizing the 1D partitioner function for general use and single responsibility,
  • Adding an offset function to calculate the shift.

@aosewski aosewski changed the title Apply refactor to universal gemm to bwd_weight_cshuffle CK-Tile Grouped GEMM refactor and post PR fixes Dec 18, 2024
include/ck_tile/core/arch/arch.hpp Outdated Show resolved Hide resolved
include/ck_tile/core/arch/arch.hpp Show resolved Hide resolved
include/ck_tile/core/arch/arch.hpp Outdated Show resolved Hide resolved
include/ck_tile/core/arch/arch.hpp Outdated Show resolved Hide resolved
include/ck_tile/host/host_tensor.hpp Outdated Show resolved Hide resolved
include/ck_tile/ops/gemm/kernel/gemm_tile_partitioner.hpp Outdated Show resolved Hide resolved
include/ck_tile/ops/gemm/kernel/grouped_gemm_kernel.hpp Outdated Show resolved Hide resolved
include/ck_tile/ops/gemm/kernel/grouped_gemm_kernel.hpp Outdated Show resolved Hide resolved
include/ck_tile/ops/gemm/kernel/grouped_gemm_kernel.hpp Outdated Show resolved Hide resolved
@aosewski aosewski dismissed their stale review December 19, 2024 15:55

Will be off from tomorrow till 2nd Jan. Please pass on review to ie. @bartekxk

@mozga-amd mozga-amd requested a review from afagaj as a code owner January 5, 2025 21:18
example/ck_tile/17_grouped_gemm/utils.hpp Outdated Show resolved Hide resolved
include/ck_tile/ops/gemm/kernel/grouped_gemm_kernel.hpp Outdated Show resolved Hide resolved
include/ck_tile/ops/gemm/kernel/gemm_tile_partitioner.hpp Outdated Show resolved Hide resolved
include/ck_tile/ops/gemm/kernel/gemm_tile_partitioner.hpp Outdated Show resolved Hide resolved
include/ck_tile/ops/gemm/kernel/gemm_tile_partitioner.hpp Outdated Show resolved Hide resolved

static constexpr index_t MPerBlock = BlockGemmShape::kM;
static constexpr index_t NPerBlock = BlockGemmShape::kN;
static constexpr index_t KPerBlock = BlockGemmShape::kK;

CK_TILE_HOST static constexpr auto GridSize(index_t M, index_t N)
CK_TILE_HOST static constexpr auto
GridSize(index_t M, index_t N) noexcept(noexcept(MPerBlock != 0 && NPerBlock != 0)) -> dim3
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder why not making static_assert right after those constexpr variables ?

Copy link
Contributor Author

@mozga-amd mozga-amd Jan 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mainly, we additionally provide info to compiler that we do not expect the function to throw an exception in any cases, except in this specific case...

The result will be identical (in fn-level). For class-lvl... the difference lies only in when the exception will be caught and what is caught—either at the class scope level or the function scope level.

In our case, the function will catch the exception at the function call level, not at the class call level. So, it will happen slightly later - but still in compile-time. Additionally, it's obvious that the function will never throw ~ and it's good to mark it as noexcept. Additionally. it could reduce the binary size by removing the handling code

Just let me know, If you want I can use either static_assert in class-scope or fn-scope

include/ck_tile/ops/gemm/kernel/gemm_tile_partitioner.hpp Outdated Show resolved Hide resolved
include/ck_tile/ops/gemm/kernel/gemm_tile_partitioner.hpp Outdated Show resolved Hide resolved
include/ck_tile/ops/gemm/kernel/gemm_tile_partitioner.hpp Outdated Show resolved Hide resolved
include/ck_tile/ops/gemm/kernel/gemm_tile_partitioner.hpp Outdated Show resolved Hide resolved
const index_t NBlock = GetNBlock(N_);

const index_t iM = __builtin_amdgcn_readfirstlane(blockIdx / NBlock);
const index_t iN = __builtin_amdgcn_readfirstlane(modulo(blockIdx, NBlock));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix pls

@mozga-amd mozga-amd merged commit 3c93d3c into develop Jan 21, 2025
10 checks passed
@mozga-amd mozga-amd deleted the mozga-amd/gemm_refactor branch January 21, 2025 20:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants