Skip to content

SUSTech-data/optimus

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Optimus: HuggingFace-Aligned 3D-parallel backend

  • flash attention 2 support on training
  • flash attention 2 support on left-padding generation with kv cache
  • fmha on GQA & MQA
  • multi model topology support by mpu context
  • more model type for experiment (PPL,RM,...)

TODO:

  • GQA & MQA generation (left-padding)

  • less model control option

  • generator based on non-batch flash attention and self-design cuda fused kernel

  • Fixed pipeline model

  • KV Cache management by pre-malloc and reuse (pre-calculate)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published