-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support mac metal #93
Comments
Thank you for the suggestion! I recently acquired an M4 device, which I believe will help facilitate further development on macOS. However, the primary goal of this project is to port vLLM (Paged Attention) to the Rust platform. At this time, I am unsure whether Paged Attention is supported or can be implemented using Metal. If you're specifically interested in LLM inference on macOS within the Rust ecosystem, you might want to explore another project by Eric @EricLBuehler that focuses on this area (mistral.rs). |
Hi @zuowenjian! I added CUDA PagedAttention to mistral.rs some time ago, but I am now actively working on a Metal implementation. @guoqingbao congrats on the M4! The PR for implementing PagedAttention on Metal can be found here: EricLBuehler/mistral.rs#1001. It's currently already able to run & compile (I've fully integrated the kernels, backend interop, scheduling, etc), but it doesn't produce correct output. I'm not sure what your experience with Metal is, but perhaps you have time to take a look? I can add you as a collaborator to push to the branch if you find anything! |
Great work, Eric! Supporting Paged Attention on Apple Silicon is a promising direction. I've reviewed your PR, but I'm uncertain whether the Metal kernel for Paged Attention that you ported from CUDA is functioning as intended. Have you tested the paged_attention metal kernel in isolation before integrating it into mistral.rs? This might help narrow down the issue with the incorrect output. |
Given that Eric has successfully ported paged attention kernels to the Metal platform (EricLBuehler/mistral.rs#1001), we can now support Mac devices in the candle-vllm project. |
@guoqingbao thanks for the great work in #97! @zuowenjian I think that Mac Metal is now supported! |
The candle project has already released version 0.8, which supports Metal acceleration on macOS. You would like candle-vllm to also support Metal acceleration on macOS.
The text was updated successfully, but these errors were encountered: