矩阵乘算子支持 #52

wanghailu0717 · 2023-10-23T01:45:41Z

1.请深入矩阵乘算子的运算过程，挖掘如下可能的性能点
1.1 并行性
1.2 高效 IO
1.3 高效计算
2.考虑如下的功能点
2.1 后融合激活操作或者下一个算子
2.2 前融合前一个算子
3.提供多种计算内核的选项，例如 cuda 平台的 cuda core / tensor core；bang 平台的张量核 / 卷积核。

KuangjuX · 2023-11-19T15:31:00Z

Commit 32b9b15 has successfully executed the GEMM code generated by CuTe on InfiniGen, and its performance has been compared with cublas. However, the current code generation is based on direct template copying. The next steps will include:

Configuring matrix blocking based on tiling.
Attempting fusion within the GEMM kernel, such as fusing relu and other activation functions.
Merging the GEMM Graph and BinaryUnaryGraph into a single graph and designing a graph that can accommodate multiple operators working together.
The current testing framework is relatively simple and lacks standardization. Continuing to brainstorm and design a standardized testing framework.

KuangjuX · 2023-11-20T08:36:53Z

Some fuse kernels:

GEMM + softmax
GEMM + add-bias
GEMM + GELU
GEMM + bias + ReLU

KuangjuX self-assigned this Nov 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

矩阵乘算子支持 #52

矩阵乘算子支持 #52

wanghailu0717 commented Oct 23, 2023

KuangjuX commented Nov 19, 2023

KuangjuX commented Nov 20, 2023 •

edited

Loading

矩阵乘算子支持 #52

矩阵乘算子支持 #52

Comments

wanghailu0717 commented Oct 23, 2023

KuangjuX commented Nov 19, 2023

KuangjuX commented Nov 20, 2023 • edited Loading

KuangjuX commented Nov 20, 2023 •

edited

Loading