support mac metal #93

zuowenjian · 2024-11-13T07:15:39Z

The candle project has already released version 0.8, which supports Metal acceleration on macOS. You would like candle-vllm to also support Metal acceleration on macOS.

guoqingbao · 2024-12-11T05:59:19Z

The candle project has already released version 0.8, which supports Metal acceleration on macOS. You would like candle-vllm to also support Metal acceleration on macOS.

Thank you for the suggestion! I recently acquired an M4 device, which I believe will help facilitate further development on macOS. However, the primary goal of this project is to port vLLM (Paged Attention) to the Rust platform. At this time, I am unsure whether Paged Attention is supported or can be implemented using Metal. If you're specifically interested in LLM inference on macOS within the Rust ecosystem, you might want to explore another project by Eric @EricLBuehler that focuses on this area (mistral.rs).

EricLBuehler · 2024-12-22T19:34:43Z

Hi @zuowenjian! I added CUDA PagedAttention to mistral.rs some time ago, but I am now actively working on a Metal implementation.

@guoqingbao congrats on the M4! The PR for implementing PagedAttention on Metal can be found here: EricLBuehler/mistral.rs#1001. It's currently already able to run & compile (I've fully integrated the kernels, backend interop, scheduling, etc), but it doesn't produce correct output. I'm not sure what your experience with Metal is, but perhaps you have time to take a look? I can add you as a collaborator to push to the branch if you find anything!

guoqingbao · 2024-12-23T06:45:02Z

Hi @zuowenjian! I added CUDA PagedAttention to mistral.rs some time ago, but I am now actively working on a Metal implementation.

@guoqingbao congrats on the M4! The PR for implementing PagedAttention on Metal can be found here: EricLBuehler/mistral.rs#1001. It's currently already able to run & compile (I've fully integrated the kernels, backend interop, scheduling, etc), but it doesn't produce correct output. I'm not sure what your experience with Metal is, but perhaps you have time to take a look? I can add you as a collaborator to push to the branch if you find anything!

Great work, Eric! Supporting Paged Attention on Apple Silicon is a promising direction. I've reviewed your PR, but I'm uncertain whether the Metal kernel for Paged Attention that you ported from CUDA is functioning as intended. Have you tested the paged_attention metal kernel in isolation before integrating it into mistral.rs? This might help narrow down the issue with the incorrect output.

guoqingbao · 2024-12-30T06:49:25Z

The candle project has already released version 0.8, which supports Metal acceleration on macOS. You would like candle-vllm to also support Metal acceleration on macOS.

Given that Eric has successfully ported paged attention kernels to the Metal platform (EricLBuehler/mistral.rs#1001), we can now support Mac devices in the candle-vllm project.

EricLBuehler · 2024-12-31T16:23:19Z

@guoqingbao thanks for the great work in #97!

@zuowenjian I think that Mac Metal is now supported!

guoqingbao added the enhancement New feature or request label Dec 11, 2024

guoqingbao self-assigned this Dec 11, 2024

EricLBuehler self-assigned this Dec 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support mac metal #93

support mac metal #93

zuowenjian commented Nov 13, 2024

guoqingbao commented Dec 11, 2024

EricLBuehler commented Dec 22, 2024

guoqingbao commented Dec 23, 2024

guoqingbao commented Dec 30, 2024

EricLBuehler commented Dec 31, 2024

support mac metal #93

support mac metal #93

Comments

zuowenjian commented Nov 13, 2024

guoqingbao commented Dec 11, 2024

EricLBuehler commented Dec 22, 2024

guoqingbao commented Dec 23, 2024

guoqingbao commented Dec 30, 2024

EricLBuehler commented Dec 31, 2024