-
Notifications
You must be signed in to change notification settings - Fork 301
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for LMStudio + MLX for maximal speed and efficiency on Apple Silicon #1495
Comments
I've been thinking about this as well… It seems that this is possible with LiteLLM as is, but it would be great if they had a full integration. Part of the reason that we wouldn't look to support this natively in R2R (similarly, how we don't directly support Ollama any longer, and instead route through LiteLLM) is that we believe these integrations to not be a core part of our infrastructure. Rather than focusing on maintaining integrations, we would look to contribute to LiteLLM and other dependencies of ours. I'll play around with this over the weekend and will add some information into the docs. Let us know if you're able to get it working! |
@NolanTrem The approach makes total sense what if there are 10 other providers, it makes sense to have a higher level router handle them and you just using them. |
Had a chance to play around with LMStudio and was extremely impressed by its performance over Ollama. There were a few changes that I had to make to our LiteLLM provider file in order to get embeddings to work (which just included dropping unsupported parameters.) I'll look to make this a permanent change, as I would be inclined to switch over to LMStudio for most of my testing going forward. Here's the config that I ended up running with:
|
Is your feature request related to a problem? Please describe.
With the recent major improvements over LMStudio, including the headless mode, it is now a a powerful alternative to Ollama with a lot of appealing features, including native support for MLX for Apple Silicon devices offering huge inference time improvements for local models.
Based on my tests i achieved 300-500% speed improvement compared to using Ollama over the same model (Llama3.2-3B).
The text was updated successfully, but these errors were encountered: