v0.0.9
Major Changes
- Enabled speculative decoding and updated README
What's Changed
- [refactor] add implicit conversion between slice and vector by @guocuimi in #134
- [refactor] change tokenizer special tokens from token to token + id. by @guocuimi in #135
- [feat] support tensor parallelism for MQA/GQA models when num_kv_heads < world_size by @guocuimi in #137
- [refactor] refactoring for sequence by @guocuimi in #140
- [unittest] added more unittests for speculative decoding by @guocuimi in #141
- [unittest] added more unittests for pos_embedding, sampler and rejection_sampler. by @guocuimi in #142
- [feat] added support for kv_cache with different strides. by @guocuimi in #143
- [feat] enable speculative decoding and update readme by @guocuimi in #145
Full Changelog: v0.0.8...v0.0.9