Skip to content

v0.2.0

Compare
Choose a tag to compare
@github-actions github-actions released this 22 Aug 01:49
· 91 commits to main since this release

What's Changed

  • kernel: port softcap support for flash attention by @guocuimi in #298
  • test: added unittests for attention sliding window by @guocuimi in #299
  • model: added gemma2 with softcap and sliding window support by @guocuimi in #300
  • kernel: support kernel test in python via pybind by @guocuimi in #301
  • test: added unittests for marlin fp16xint4 gemm by @guocuimi in #302
  • fix: move eos out of stop token list to honor ignore_eos option by @guocuimi in #305
  • refactor: move models to upper folder by @guocuimi in #306
  • kernel: port gptq marlin kernel and fp8 marlin kernel by @guocuimi in #307
  • rust: upgrade rust libs to latest version by @guocuimi in #309
  • refactor: remove the logic loading individual weight from shared partitions by @guocuimi in #311
  • feat: added fused column parallel linear by @guocuimi in #313
  • feat: added gptq marlin qlinear layer by @guocuimi in #312
  • kernel: port awq repack kernel by @guocuimi in #314

Full Changelog: v0.1.9...v0.2.0