Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inquiry Regarding vLLM Support for Mac Metal API #2081

Closed
yihong1120 opened this issue Dec 13, 2023 · 13 comments · May be fixed by #12640
Closed

Inquiry Regarding vLLM Support for Mac Metal API #2081

yihong1120 opened this issue Dec 13, 2023 · 13 comments · May be fixed by #12640

Comments

@yihong1120
Copy link

Dear vLLM Maintainers,

I hope this message finds you well. I am reaching out to inquire about the potential for integrating Mac Metal API support within the vLLM framework. As an avid user and advocate for vLLM's capabilities, I have been thoroughly impressed with its performance and flexibility across various platforms and hardware configurations.

Given the increasing prevalence of Mac devices in the machine learning community and the performance benefits offered by Apple's Metal API for GPU-accelerated computing, I am curious to know if there are any plans to extend vLLM's compatibility to include Metal support. This would undoubtedly be a significant boon for researchers and developers working on Mac environments who wish to leverage vLLM's impressive suite of features.

Could you please shed some light on the following aspects:

  1. Are there any ongoing efforts or discussions around incorporating Metal API support into vLLM?
  2. If such plans are in the pipeline, what is the anticipated timeline for the availability of this feature?
  3. How might the community contribute to expediting this process, and are there specific areas where contributions are most needed?

I understand that integrating a new backend such as Metal may present a variety of challenges, but I believe the potential benefits to the user community could be substantial. I am keen to offer my assistance, whether it be through testing, development, or documentation, to help bring this capability to fruition.

Thank you for your time and consideration. I eagerly await your response and am excited about the prospect of further enhancing vLLM's accessibility and performance on Mac platforms.

Best regards,
yihong1120

@jagtesh
Copy link

jagtesh commented Jan 30, 2024

Torch has officially supports Metal for a while now. Would adding support in vLLM be as simple as changing device="cuda" to "mps" on Macs? Are there any other dependencies on CUDA?

@jagtesh
Copy link

jagtesh commented Feb 2, 2024

Torch has officially supports Metal for a while now. Would adding support in vLLM be as simple as changing device="cuda" to "mps" on Macs? Are there any other dependencies on CUDA?

Anyone? I'd be happy to rewrite the implementation without the hardcoded device name - just don't want to spend hours down a dead-end.

@C0deMunk33
Copy link

I'd like to see this work as well, lots of Metal out there

@bluenevus
Copy link

same here please

@chen-bowen
Copy link

+1

@nostaljic
Copy link

Wish it could be implemented🥺

@jagtesh
Copy link

jagtesh commented Apr 14, 2024

My offer still stands if someone on the project can answer the above questions.

@hmellor
Copy link
Collaborator

hmellor commented Apr 18, 2024

@pathorn says they have an implementation that runs on M3 chips in #176 (comment)

Do you think it could be adapted to the new CPU backend that was added in #3634?

@jagtesh
Copy link

jagtesh commented Apr 19, 2024

@pathorn says they have an implementation that runs on M3 chips in #176 (comment)

Do you think it could be adapted to the new CPU backend that was added in #3634?

FYI for anyone who wants to see that PR: #2244 (comment). @pathorn did some tremendous work on the PR. However, llama.cpp still performs faster - by a mile. This may not be a fruitful endeavour after all.

@hmellor
Copy link
Collaborator

hmellor commented Sep 20, 2024

Closing because other projects such as llama.cpp are more appropriate for running LLMs on Apple silicon.

@hmellor hmellor closed this as not planned Won't fix, can't repro, duplicate, stale Sep 20, 2024
@baggiponte
Copy link

Closing because other projects such as llama.cpp are more appropriate for running LLMs on Apple silicon.

I agree it makes sense, but I would not be super concerned by the performance. I have a M3 Mac provided by my company and the development loop is really slow when it comes to vLLM. Despite suboptimal, I'd love to just be able to run [uv] pip install vLLM and be free to experiment locally, then push to prod.

@RobotSail
Copy link

Also want to +1 this idea, it would be much cheaper to buy a 192GB Apple Silicon machine to run full Mixtral 8x7B than shelling out the money for an NVIDIA rig with equivalent memory.

@skyzh
Copy link

skyzh commented Feb 1, 2025

Hi all, I have a draft patch to try integrating vLLM with Metal. It runs, but unfortunately, it doesn't produce any reasonable result, probably due to MPS<->CPU data movement and fallbacks. So I've submitted the patch and marked it a draft, in case someone can help :)

#12640

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants