Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

矩阵乘算子支持 #52

Open
wanghailu0717 opened this issue Oct 23, 2023 · 2 comments
Open

矩阵乘算子支持 #52

wanghailu0717 opened this issue Oct 23, 2023 · 2 comments
Assignees

Comments

@wanghailu0717
Copy link
Contributor

1.请深入矩阵乘算子的运算过程,挖掘如下可能的性能点
1.1 并行性
1.2 高效 IO
1.3 高效计算
2.考虑如下的功能点
2.1 后融合激活操作或者下一个算子
2.2 前融合前一个算子
3.提供多种计算内核的选项,例如 cuda 平台的 cuda core / tensor core;bang 平台的 张量核 / 卷积核。

@KuangjuX KuangjuX self-assigned this Nov 13, 2023
@KuangjuX
Copy link

Commit 32b9b15 has successfully executed the GEMM code generated by CuTe on InfiniGen, and its performance has been compared with cublas. However, the current code generation is based on direct template copying. The next steps will include:

  • Configuring matrix blocking based on tiling.
  • Attempting fusion within the GEMM kernel, such as fusing relu and other activation functions.
  • Merging the GEMM Graph and BinaryUnaryGraph into a single graph and designing a graph that can accommodate multiple operators working together.
  • The current testing framework is relatively simple and lacks standardization. Continuing to brainstorm and design a standardized testing framework.

@KuangjuX
Copy link

KuangjuX commented Nov 20, 2023

Some fuse kernels:

  • GEMM + softmax
  • GEMM + add-bias
  • GEMM + GELU
  • GEMM + bias + ReLU

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants