v0.0.9

guocuimi released this 28 Apr 03:50

· 236 commits to main since this release

638e616

Major Changes

Enabled speculative decoding and updated README

What's Changed

[refactor] add implicit conversion between slice and vector by @guocuimi in #134
[refactor] change tokenizer special tokens from token to token + id. by @guocuimi in #135
[feat] support tensor parallelism for MQA/GQA models when num_kv_heads < world_size by @guocuimi in #137
[refactor] refactoring for sequence by @guocuimi in #140
[unittest] added more unittests for speculative decoding by @guocuimi in #141
[unittest] added more unittests for pos_embedding, sampler and rejection_sampler. by @guocuimi in #142
[feat] added support for kv_cache with different strides. by @guocuimi in #143
[feat] enable speculative decoding and update readme by @guocuimi in #145

Full Changelog: v0.0.8...v0.0.9

Contributors

guocuimi

Assets 2