LMDeploy Release V0.5.3
What's Changed
🚀 Features
- PyTorch Engine AWQ support by @grimoire in #1913
- Phi3 awq by @grimoire in #1984
- Fix chunked prefill by @lzhangzz in #2201
- support VLMs with Qwen as the language model by @irexyc in #2207
💥 Improvements
- Support specifying a prefix of assistant response by @AllentDan in #2172
- Strict check for
name_map
inInternLM2Chat7B
by @SamuraiBUPT in #2156 - Check errors for attention kernels by @lzhangzz in #2206
- update base image to support cuda12.4 in dockerfile by @RunningLeon in #2182
- Stop synchronizing for
length_criterion
by @lzhangzz in #2202 - adapt MiniCPM-Llama3-V-2_5 new code by @irexyc in #2139
- Remove duplicate code by @cmpute in #2133
🐞 Bug fixes
- [Hotfix] miss parentheses when calcuating the coef of llama3 rope by @lvhan028 in #2157
- support logit softcap by @grimoire in #2158
- Fix gmem to smem WAW conflict in awq gemm kernel by @foreverrookie in #2111
- Fix gradio serve using a wrong chat template by @AllentDan in #2131
- fix runtime error when using dynamic scale rotary embed for InternLM2… by @CyCle1024 in #2212
- Add peer-access-enabled allocator by @lzhangzz in #2218
- Fix typos in profile_generation.py by @jiajie-yang in #2233
📚 Documentations
- docs: fix Qwen typo by @ArtificialZeng in #2136
- wrong expression by @ArtificialZeng in #2165
- clearify the model type LLM or MLLM in supported model matrix by @lvhan028 in #2209
- docs: add Japanese README by @eltociear in #2237
🌐 Other
- bump version to 0.5.2.post1 by @lvhan028 in #2159
- update news about cooperation with modelscope/swift by @lvhan028 in #2200
- bump version to v0.5.3 by @lvhan028 in #2242
New Contributors
- @ArtificialZeng made their first contribution in #2136
- @foreverrookie made their first contribution in #2111
- @SamuraiBUPT made their first contribution in #2156
- @CyCle1024 made their first contribution in #2212
- @jiajie-yang made their first contribution in #2233
- @cmpute made their first contribution in #2133
Full Changelog: v0.5.2...v0.5.3