Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
optimus		optimus
test		test
.clang-format		.clang-format
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md
pyproject.toml		pyproject.toml

Repository files navigation

Optimus: HuggingFace-Aligned 3D-parallel backend

flash attention 2 support on training
flash attention 2 support on left-padding generation with kv cache
fmha on GQA & MQA
multi model topology support by mpu context
more model type for experiment (PPL,RM,...)

TODO:

GQA & MQA generation (left-padding)
less model control option
generator based on non-batch flash attention and self-design cuda fused kernel
Fixed pipeline model
KV Cache management by pre-malloc and reuse (pre-calculate)

About

No description, website, or topics provided.

Custom properties

Report repository

Releases

No releases published

Packages

No packages published

Languages