You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The feature you mentioned is highly dependent on business requirements and real-world scenarios. It's a promising area for improvement, such as https://github.com/kvcache-ai/Mooncake, and @Jeffwan might be also interested in it. While it's on our roadmap #634, its priority and implementation complexity mean the short-term ROI is not very high. And we'll keep tracking its progress. Thanks for your interest!
Motivation
Related work includes:
1.https://github.com/LLMServe/DistServe/tree/main
2.vllm-project/vllm#2809
3.mooncake has proven that separating prefill and decode can lead to throughput improvements and significant cost savings for online services. Are there any plans to do this?
Related resources
No response
The text was updated successfully, but these errors were encountered: