Skip to content

Commit

Permalink
Disable chunked prefill and/or prefix caching when MLA is enabled (#1…
Browse files Browse the repository at this point in the history
…2642)

From @mgoin in #12638

I cannot push to that branch, therefore a new PR to unblock release.

---------

Signed-off-by: mgoin <[email protected]>
Signed-off-by: simon-mo <[email protected]>
Co-authored-by: mgoin <[email protected]>
  • Loading branch information
simon-mo and mgoin authored Feb 1, 2025
1 parent 1e36983 commit 4f4d427
Showing 1 changed file with 10 additions and 0 deletions.
10 changes: 10 additions & 0 deletions vllm/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -3252,6 +3252,16 @@ def __post_init__(self):

current_platform.check_and_update_config(self)

# If MLA is enabled, force disable chunked prefill and prefix caching
if self.model_config and self.model_config.use_mla:
logger.info("MLA is enabled; forcing chunked prefill and prefix "
"caching to be disabled.")
self.scheduler_config.enable_chunked_prefill = False
self.scheduler_config.chunked_prefill_enabled = False

if self.cache_config is not None:
self.cache_config.enable_prefix_caching = False

if not self.instance_id:
self.instance_id = random_uuid()[:5]

Expand Down

0 comments on commit 4f4d427

Please sign in to comment.