-
-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inquiry Regarding vLLM Support for Mac Metal API #2081
Comments
Torch has officially supports Metal for a while now. Would adding support in vLLM be as simple as changing |
Anyone? I'd be happy to rewrite the implementation without the hardcoded device name - just don't want to spend hours down a dead-end. |
I'd like to see this work as well, lots of Metal out there |
same here please |
+1 |
Wish it could be implemented🥺 |
My offer still stands if someone on the project can answer the above questions. |
@pathorn says they have an implementation that runs on M3 chips in #176 (comment) Do you think it could be adapted to the new CPU backend that was added in #3634? |
FYI for anyone who wants to see that PR: #2244 (comment). @pathorn did some tremendous work on the PR. However, llama.cpp still performs faster - by a mile. This may not be a fruitful endeavour after all. |
Closing because other projects such as llama.cpp are more appropriate for running LLMs on Apple silicon. |
I agree it makes sense, but I would not be super concerned by the performance. I have a M3 Mac provided by my company and the development loop is really slow when it comes to vLLM. Despite suboptimal, I'd love to just be able to run |
Also want to +1 this idea, it would be much cheaper to buy a 192GB Apple Silicon machine to run full Mixtral 8x7B than shelling out the money for an NVIDIA rig with equivalent memory. |
Hi all, I have a draft patch to try integrating vLLM with Metal. It runs, but unfortunately, it doesn't produce any reasonable result, probably due to MPS<->CPU data movement and fallbacks. So I've submitted the patch and marked it a draft, in case someone can help :) |
Dear vLLM Maintainers,
I hope this message finds you well. I am reaching out to inquire about the potential for integrating Mac Metal API support within the vLLM framework. As an avid user and advocate for vLLM's capabilities, I have been thoroughly impressed with its performance and flexibility across various platforms and hardware configurations.
Given the increasing prevalence of Mac devices in the machine learning community and the performance benefits offered by Apple's Metal API for GPU-accelerated computing, I am curious to know if there are any plans to extend vLLM's compatibility to include Metal support. This would undoubtedly be a significant boon for researchers and developers working on Mac environments who wish to leverage vLLM's impressive suite of features.
Could you please shed some light on the following aspects:
I understand that integrating a new backend such as Metal may present a variety of challenges, but I believe the potential benefits to the user community could be substantial. I am keen to offer my assistance, whether it be through testing, development, or documentation, to help bring this capability to fruition.
Thank you for your time and consideration. I eagerly await your response and am excited about the prospect of further enhancing vLLM's accessibility and performance on Mac platforms.
Best regards,
yihong1120
The text was updated successfully, but these errors were encountered: