Replies: 2 comments
-
Would highly recommend supporting MLX. Runs natively on Apple Silicon, is fast, and is likely to be the defacto inference engine for Apple M-series chips. It's come a long, long way. There's an entire community of models with great quantization features: mlx-community (MLX Community) Now even supports native distributed mode, so you can run inference over multiple Mac devices - https://github.com/ml-explore/mlx-examples/blob/main/llms/mlx_lm/examples/pipeline_generate.py I'd be happy to help work on this if there's interest in something beyond ollama. |
Beta Was this translation helpful? Give feedback.
-
Since Goose does not support LM-studio as an LLM provider I built an Ollama proxy to convert your queries. Its working on MLX models. Check it out, hope it helps! |
Beta Was this translation helpful? Give feedback.
-
MLX it's like ollama but only for arm mac's and faster than ollama. Here is a links of project.
Site
GitHub
Docs from HuggingFace
Official examples
Beta Was this translation helpful? Give feedback.
All reactions