Inquiry Regarding vLLM Support for Mac Metal API #2081

yihong1120 · 2023-12-13T08:50:01Z

Dear vLLM Maintainers,

I hope this message finds you well. I am reaching out to inquire about the potential for integrating Mac Metal API support within the vLLM framework. As an avid user and advocate for vLLM's capabilities, I have been thoroughly impressed with its performance and flexibility across various platforms and hardware configurations.

Given the increasing prevalence of Mac devices in the machine learning community and the performance benefits offered by Apple's Metal API for GPU-accelerated computing, I am curious to know if there are any plans to extend vLLM's compatibility to include Metal support. This would undoubtedly be a significant boon for researchers and developers working on Mac environments who wish to leverage vLLM's impressive suite of features.

Could you please shed some light on the following aspects:

Are there any ongoing efforts or discussions around incorporating Metal API support into vLLM?
If such plans are in the pipeline, what is the anticipated timeline for the availability of this feature?
How might the community contribute to expediting this process, and are there specific areas where contributions are most needed?

I understand that integrating a new backend such as Metal may present a variety of challenges, but I believe the potential benefits to the user community could be substantial. I am keen to offer my assistance, whether it be through testing, development, or documentation, to help bring this capability to fruition.

Thank you for your time and consideration. I eagerly await your response and am excited about the prospect of further enhancing vLLM's accessibility and performance on Mac platforms.

Best regards,
yihong1120

jagtesh · 2024-01-30T20:15:36Z

Torch has officially supports Metal for a while now. Would adding support in vLLM be as simple as changing device="cuda" to "mps" on Macs? Are there any other dependencies on CUDA?

jagtesh · 2024-02-02T21:25:18Z

Torch has officially supports Metal for a while now. Would adding support in vLLM be as simple as changing device="cuda" to "mps" on Macs? Are there any other dependencies on CUDA?

Anyone? I'd be happy to rewrite the implementation without the hardcoded device name - just don't want to spend hours down a dead-end.

C0deMunk33 · 2024-02-06T17:51:17Z

I'd like to see this work as well, lots of Metal out there

bluenevus · 2024-02-16T19:51:00Z

same here please

chen-bowen · 2024-03-14T15:40:21Z

+1

nostaljic · 2024-04-13T02:57:47Z

Wish it could be implemented🥺

jagtesh · 2024-04-14T04:35:10Z

My offer still stands if someone on the project can answer the above questions.

hmellor · 2024-04-18T11:34:59Z

@pathorn says they have an implementation that runs on M3 chips in #176 (comment)

Do you think it could be adapted to the new CPU backend that was added in #3634?

jagtesh · 2024-04-19T16:54:54Z

@pathorn says they have an implementation that runs on M3 chips in #176 (comment)

Do you think it could be adapted to the new CPU backend that was added in #3634?

FYI for anyone who wants to see that PR: #2244 (comment). @pathorn did some tremendous work on the PR. However, llama.cpp still performs faster - by a mile. This may not be a fruitful endeavour after all.

hmellor · 2024-09-20T20:29:58Z

Closing because other projects such as llama.cpp are more appropriate for running LLMs on Apple silicon.

baggiponte · 2024-10-01T09:24:17Z

Closing because other projects such as llama.cpp are more appropriate for running LLMs on Apple silicon.

I agree it makes sense, but I would not be super concerned by the performance. I have a M3 Mac provided by my company and the development loop is really slow when it comes to vLLM. Despite suboptimal, I'd love to just be able to run [uv] pip install vLLM and be free to experiment locally, then push to prod.

RobotSail · 2024-10-25T03:06:59Z

Also want to +1 this idea, it would be much cheaper to buy a 192GB Apple Silicon machine to run full Mixtral 8x7B than shelling out the money for an NVIDIA rig with equivalent memory.

skyzh · 2025-02-01T05:25:12Z

Hi all, I have a draft patch to try integrating vLLM with Metal. It runs, but unfortunately, it doesn't produce any reasonable result, probably due to MPS<->CPU data movement and fallbacks. So I've submitted the patch and marked it a draft, in case someone can help :)

#12640

This was referenced May 31, 2024

Can it support macos ? M2 chip. #1921

Closed

[question] Does vllm support macos M1 or M2 chip? #1397

Closed

hmellor closed this as not planned Won't fix, can't repro, duplicate, stale Sep 20, 2024

codelion mentioned this issue Oct 21, 2024

I get the following error: list index out of range codelion/optillm#67

Closed

skyzh mentioned this issue Feb 1, 2025

[Hardware][Metal] Apple Metal support #12640

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inquiry Regarding vLLM Support for Mac Metal API #2081

Inquiry Regarding vLLM Support for Mac Metal API #2081

yihong1120 commented Dec 13, 2023

jagtesh commented Jan 30, 2024 •

edited

Loading

jagtesh commented Feb 2, 2024 •

edited

Loading

C0deMunk33 commented Feb 6, 2024

bluenevus commented Feb 16, 2024

chen-bowen commented Mar 14, 2024

nostaljic commented Apr 13, 2024

jagtesh commented Apr 14, 2024

hmellor commented Apr 18, 2024

jagtesh commented Apr 19, 2024

hmellor commented Sep 20, 2024

baggiponte commented Oct 1, 2024

RobotSail commented Oct 25, 2024

skyzh commented Feb 1, 2025

Inquiry Regarding vLLM Support for Mac Metal API #2081

Inquiry Regarding vLLM Support for Mac Metal API #2081

Comments

yihong1120 commented Dec 13, 2023

jagtesh commented Jan 30, 2024 • edited Loading

jagtesh commented Feb 2, 2024 • edited Loading

C0deMunk33 commented Feb 6, 2024

bluenevus commented Feb 16, 2024

chen-bowen commented Mar 14, 2024

nostaljic commented Apr 13, 2024

jagtesh commented Apr 14, 2024

hmellor commented Apr 18, 2024

jagtesh commented Apr 19, 2024

hmellor commented Sep 20, 2024

baggiponte commented Oct 1, 2024

RobotSail commented Oct 25, 2024

skyzh commented Feb 1, 2025

jagtesh commented Jan 30, 2024 •

edited

Loading

jagtesh commented Feb 2, 2024 •

edited

Loading