-
Notifications
You must be signed in to change notification settings - Fork 462
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* initial moe support * dynamic grouped gemm * benchmark * moe benchmark * moe sampling * split-k * refactor tuning * simplify * n-major weight * add `num` for `MatrixLayout` * packed rows * packed cols * dispatch for packed rows * w4a16 moe * refactor model loading * fix pytorch loader * refactor * dispatch w4a16 moe * fix loader * add comment * fix msvc build * fix msvc build * fix msvc build * fix ut * fix ut * fix p-lora * add all support arches * minor * fix lint * fix lint * fix lint * fix ut * bf16 support * minor * refactor * fix lint * fix ut * minor * minor * minor * fix inter_size config * load with non-standard filenames * fix loader * fix missing default param * defer the loading of misc weights for safetensors * fix conversion * fix deepseek-vl * verify model config * pad inter size by group size and tp * fix minicpm attn bias & ignore un-needed bias * set `attn_bias` based on minicpm version
- Loading branch information
Showing
105 changed files
with
5,703 additions
and
1,772 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file was deleted.
Oops, something went wrong.
Oops, something went wrong.