-
Notifications
You must be signed in to change notification settings - Fork 528
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Summary: Pull Request resolved: #3399 X-link: facebookresearch/FBGEMM#487 Just some scaffolding code for group gemm. The idea is that: * for router score, we'll move the non-zero to the left side, and calculate the indices and the number of non-zeros for each local expert * group gemm input (needs more discussion): * input: 3D tensor [local_expert, tokens, D] * input: router_nonzeros tensor - on the M dimension, how many of them needs to be calculated * output: We need pad 0 to those 0 entries to make it work with cudagraph. We only support bf16 grouped gemm for now. FP8 grouped gemm only supports tensor-wise scaling and rowwise scaling has some limitation in cutlass that requires some further work. Reviewed By: jiawenliu64 Differential Revision: D65260109 fbshipit-source-id: e9b60241c173af34b84d33184262776ca0b38310
- Loading branch information
1 parent
efdb2d0
commit 0505ed8
Showing
2 changed files
with
81 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters