Releases: guinmoon/LLMFarm
Releases · guinmoon/LLMFarm
v0.6.0
Changes
- add grammar sampling for llama models, you can put .gbnf files to the grammars directory
- llama.cpp updated to b1256
- rwkv updated to 8db73b1
- gpt-2 updated
- rwkv_eval_sequence 20% increase speed
- handle GGML_ASSERT
- fixed many errors
- new llama2 and saiga template
** Due to error ScrollViewReader, autoscroll is disabled on iOS <16.4
v0.5.2
Changes
- Added mmap and mlock options
- Added prompt format text editor with multiline support
- Added tfs_z and typical_p sampling parameters
- Fixed many errors on model loading.
- Fixed scrolling issue
- UI Improvements
- Templates improvments
** Due to error ScrollViewReader, autoscroll is disabled on iOS <16.4
v0.5.0
Changes
- llama.cpp updated to b1132, GGUF format support and increase in the speed. The old file format is still supported but uses llama dadbed9.
- Add Falcon models support (only GGUF)
- Add template for RWKV-4
- Fix model rename
- Fixed some UI bugs that could cause the app to crash.
- Fix llama, replit token_to_str
** In order to use llama.cpp b1132, the model file must have the .gguf extension.
*** Unfortunately, due to a bug in the latest versions of llama.cpp, Metal not supported on intel Macs at this time.
v0.4.5
Changes
- llama.cpp updated to dadbed9, A noticeable increase in the speed of Metal on iOS. Now 7B qK_3 model works fine on iphone 12
- Add models management
- Add template for run LLaMA 2 on iPhone
- Fix template set context size
Now you can install LLMFarm
on iOS devices with TestFlight
** llmfarm_core
has been moved to a separate repository. To build LLMFarm
, you need to clone this repository recursively:
git clone --recurse-submodules https://github.com/guinmoon/LLMFarm
v0.4.2
v0.4.0
Changes
- Add RWKV inference support (now only 20b tokenizer). Tested on this models.
v0.3.2
Changes
- llama.cpp updated to 84e09a7d8bc4ab6d658b5cd81295ac0add60be78
Noticeable increase in speed for 3B models on iOS with Metal - QKK_64 Build Can be used for quantization 3B models with k_quants
See more details here - Add
reverse prompt
option to stop prediction - Add predict time to message
v0.3.0
Changes
- Add Starcoder(Santacoder) inference, tested on this model
- Add Model settings Templates, for quick setup of prompt format and model parameters
- llama.cpp and ggml updated
- UI Improvements