Release v0.3.2 · EricLBuehler/mistral.rs

Key changes

General improvements and fixes
ISQ FP8
GPTQ Marlin
26% performance boost on Metal
Python package wheels are available. See below and the various PyPi packages.

What's Changed

Update docs and deps by @EricLBuehler in #804
Support Qwen 2.5 by @EricLBuehler in #805
Update docs with clarifications and notes by @EricLBuehler in #806
Improved inverting for Attention Mask by @EricLBuehler in #811
Fix repeat_interleave by @EricLBuehler in #812
Use f32 for neg inf in cross attn mask by @EricLBuehler in #814
Improve UQFF memory efficiency by @EricLBuehler in #813
Update Metal, CUDA Candle impls and ISQ by @EricLBuehler in #816
chore: update pagedattention.cu by @eltociear in #822
MLlama - if f16, load vision model in f32 by @EricLBuehler in #820
ci: Upgrade actions by @polarathene in #823
docs: added a top button because of readme length by @bhargavshirin in #833
Typo in error of model architecture enum by @nikolaydubina in #835
Expose config for Rust api, tweak modekind by @EricLBuehler in #841
Add ISQ FP8 by @EricLBuehler in #832
Fix Metal F8 build errors by @EricLBuehler in #846
Bump pyo3 from 0.22.3 to 0.22.4 by @dependabot in #854
Generate standalone UQFF models by @EricLBuehler in #849
Update README.MD by @kaleaditya779 in #848
Add GPTQ Marlin support for 4 and 8 bit by @EricLBuehler in #856
Adds wrap_help feature to clap by @DaveTJones in #858
Patch UQFF metal generation by @EricLBuehler in #857
Add GGUF Qwen 2 by @EricLBuehler in #860
Avoid duplicate Metal command buffer encodings during ISQ by @EricLBuehler in #861
Fix for isnanf by @EricLBuehler in #859
Fix some metal warnings by @EricLBuehler in #862
Support interactive mode markdown bold/italics via ANSI codes by @EricLBuehler in #879
Even better V-Llama accuracy by @EricLBuehler in #881
Trim whitespace (such as carriage returns) from nvidia-smi output. by @asaddi in #880
MODEL_ID not "MODEL_ID" by @simonw in #863
Sync ggml metal kernels by @EricLBuehler in #885
Increase Metal decoding T/s by 26% by @EricLBuehler in #887
Remove pretty-printer by @EricLBuehler in #889
Fix typo in documentation by @msk in #888
fix Half-Quadratic Quantization and Dequantization on CPU by @haricot in #873
Prepare for v0.3.2 by @EricLBuehler in #891

New Contributors

@bhargavshirin made their first contribution in #833
@nikolaydubina made their first contribution in #835
@kaleaditya779 made their first contribution in #848
@DaveTJones made their first contribution in #858
@asaddi made their first contribution in #880
@simonw made their first contribution in #863
@msk made their first contribution in #888
@haricot made their first contribution in #873

Full Changelog: v0.3.1...v0.3.2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.3.2

Key changes

What's Changed

New Contributors

Contributors