Release v0.2.0 · vectorch-ai/ScaleLLM

What's Changed

kernel: port softcap support for flash attention by @guocuimi in #298
test: added unittests for attention sliding window by @guocuimi in #299
model: added gemma2 with softcap and sliding window support by @guocuimi in #300
kernel: support kernel test in python via pybind by @guocuimi in #301
test: added unittests for marlin fp16xint4 gemm by @guocuimi in #302
fix: move eos out of stop token list to honor ignore_eos option by @guocuimi in #305
refactor: move models to upper folder by @guocuimi in #306
kernel: port gptq marlin kernel and fp8 marlin kernel by @guocuimi in #307
rust: upgrade rust libs to latest version by @guocuimi in #309
refactor: remove the logic loading individual weight from shared partitions by @guocuimi in #311
feat: added fused column parallel linear by @guocuimi in #313
feat: added gptq marlin qlinear layer by @guocuimi in #312
kernel: port awq repack kernel by @guocuimi in #314

Full Changelog: v0.1.9...v0.2.0