v0.2.0
What's Changed
- kernel: port softcap support for flash attention by @guocuimi in #298
- test: added unittests for attention sliding window by @guocuimi in #299
- model: added gemma2 with softcap and sliding window support by @guocuimi in #300
- kernel: support kernel test in python via pybind by @guocuimi in #301
- test: added unittests for marlin fp16xint4 gemm by @guocuimi in #302
- fix: move eos out of stop token list to honor ignore_eos option by @guocuimi in #305
- refactor: move models to upper folder by @guocuimi in #306
- kernel: port gptq marlin kernel and fp8 marlin kernel by @guocuimi in #307
- rust: upgrade rust libs to latest version by @guocuimi in #309
- refactor: remove the logic loading individual weight from shared partitions by @guocuimi in #311
- feat: added fused column parallel linear by @guocuimi in #313
- feat: added gptq marlin qlinear layer by @guocuimi in #312
- kernel: port awq repack kernel by @guocuimi in #314
Full Changelog: v0.1.9...v0.2.0