v1.3.0
Changes:
- LLaMA.cpp updated to b3190
- Added support for DeepseekV2, GPTNeoX (Pythia and others)
- Added support for Markdown formatting
- Added support for using history in Shortcuts
- Added Flash Attention support
- Added NPredict option
- Metal and CPU inference improvements
- Sampling and eval improvements
- Some fixes for phi-3 and MiniCPM
- Fixed some errors
- Added Qwen template