Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature][Frontend] Add KVTransferParams for disaggregated prefill feature #12957

Open
wants to merge 14 commits into
base: main
Choose a base branch
from

Conversation

ShangmingCai
Copy link
Contributor

@ShangmingCai ShangmingCai commented Feb 8, 2025

This is the initial PR to implement XpYd as described in the design doc, which has gone through several rounds of discussions in the #feat-prefill-disaggregation slack channel and reached a preliminary consensus.

This PR mainly includes the changes related to KVTransferParams. In the implementation of XpYd, we will use this parameter to coordinate metadata among all vllm instances (prefill nodes and decode nodes), the KVStore Server, and the global proxy. With this param, the global proxy can attach the disaggregated prefill metadata info in the CompletionRequest or ChatCompletionRequest, and all kv-transfer-related modules can access it through the model_input in function model_runner.execute_model and use these params to interact with the KVStore Server at the right place under current 1P1D implementation.

Based on this initial PR, the Mooncake team will implement the KVStoreConnector in the next PR, which utilizes KVTransferParams and supports several third-party key-value stores to act as the KVLookupBuffer such as MooncakeStore, Valkey, LMCache, etc. With KVStoreConnector, we can decouple the connections between the Prefill nodes and the Decode nodes, making the entire XpYd implementation simpler and more stable.

The metadata info required in KVTransferParams will be provided by a global proxy (or we can call it the XpYd orchestration mechanism). Therefore, a global proxy interface will be added in the next PR too. Mooncake team will implement a simple version based on round-robin. In the production stage, service providers and users can also implement corresponding global proxy strategies according to their needs.

CC list: @KuntaiDu, @youkaichao, @james0zan, @alogfans, @stmatengss, @zeroorhero, @pizhenwei

Copy link

github-actions bot commented Feb 8, 2025

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

@mergify mergify bot added the frontend label Feb 8, 2025
Signed-off-by: Shangming Cai <[email protected]>
Signed-off-by: Shangming Cai <[email protected]>
Signed-off-by: Shangming Cai <[email protected]>
Signed-off-by: Shangming Cai <[email protected]>
Signed-off-by: Shangming Cai <[email protected]>
@ShangmingCai ShangmingCai force-pushed the add_kv_transfer_params branch from 6cdcc91 to a45736e Compare February 11, 2025 04:03
Signed-off-by: Shangming Cai <[email protected]>
@ShangmingCai ShangmingCai changed the title [WIP] Add KVTransferParams for disaggregated prefill feature [Feature][Frontend] Add KVTransferParams for disaggregated prefill feature Feb 11, 2025
@ShangmingCai
Copy link
Contributor Author

@KuntaiDu Hello, this is the initial PR for the XpYd design as we discussed before the Chinese New Year. I think it is ready for review :)

@ShangmingCai ShangmingCai force-pushed the add_kv_transfer_params branch from 70db613 to 1f194db Compare February 12, 2025 11:15
Signed-off-by: Shangming Cai <[email protected]>
Signed-off-by: Shangming Cai <[email protected]>
@ShangmingCai
Copy link
Contributor Author

@youkaichao I have verified that the failure of distributed-tests-4-gpus ci can be solved by #12959. However, the image created by docker-build-image does not include the latest fix. So re-triggering ci without rebase does not seem to solve the ci failure that happens to exist when this PR is created but fixed later. And ever since DCO was introduced, I found that rebase has become difficult. I wonder if there is a better solution to rebase or merge without triggering DCO, so that we can fix the failed CI that has been introduced by the stale code of the main branch that happens to exist when this PR is created.

@@ -240,6 +240,11 @@ class ChatCompletionRequest(OpenAIBaseModel):
parallel_tool_calls: Optional[bool] = False
user: Optional[str] = None

# kv transfer params
prefix_prompt_ids: Optional[List[int]] = None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it is List[int], It seems to require the orchestrator to run per-request tokenization. Will that limits the scalability of the orchestrator?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In practice, if we were only implementing the XpYd scheme, the orchestrator would only need to handle request routing between p-nodes and d-nodes, along with managing the keys for KV cache transmission. However, to ensure compatibility with prefix caching features in the future and enable the proxy to maintain a prefix tree constructed from token ids for more content-aware disaggregated prefill scheduling, this parameter is introduced to support broader KVCache transfer options and scheduling scenarios. In fact, it could be absent in the first XpYd implementation since we assume prefix_prompt_ids should essentially match the prompt_token_ids in disaggregated prefill scenarios.

priority=request.priority,
)
# Note(shangming): v1 does not support KV transfer yet.
if isinstance(self.engine_client, AsyncLLM):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me discuss with other people and see how to deal with KVTransferParams in v1.

@@ -0,0 +1,71 @@
# SPDX-License-Identifier: Apache-2.0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just want to learn more on the rationales behind isolating the KVTransferParams in an isolated file instead of stuffing it into one existing file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. Initially, I planned to add it in sampling_params.py, but since the kv transfer process is unrelated to sampling, combining them might make the code harder to understand. I referenced the pooling_params.py implementation and used a separate file for this class to improve readability. Regarding the location, while placing it in the root directory (like other params) works, moving it to the vllm/distributed/kv_transfer directory is also an option. Let me know which approach you prefer!

@KuntaiDu KuntaiDu self-assigned this Feb 12, 2025
Copy link

mergify bot commented Feb 12, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @ShangmingCai.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Feb 12, 2025
@mergify mergify bot removed the needs-rebase label Feb 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants