Skip to content

v0.0.9

Compare
Choose a tag to compare
@guocuimi guocuimi released this 28 Apr 03:50
· 236 commits to main since this release
638e616

Major Changes

  • Enabled speculative decoding and updated README

What's Changed

  • [refactor] add implicit conversion between slice and vector by @guocuimi in #134
  • [refactor] change tokenizer special tokens from token to token + id. by @guocuimi in #135
  • [feat] support tensor parallelism for MQA/GQA models when num_kv_heads < world_size by @guocuimi in #137
  • [refactor] refactoring for sequence by @guocuimi in #140
  • [unittest] added more unittests for speculative decoding by @guocuimi in #141
  • [unittest] added more unittests for pos_embedding, sampler and rejection_sampler. by @guocuimi in #142
  • [feat] added support for kv_cache with different strides. by @guocuimi in #143
  • [feat] enable speculative decoding and update readme by @guocuimi in #145

Full Changelog: v0.0.8...v0.0.9