[vllm] Support speculative decoding in vllm rolling batch #2413

xyang16 · 2024-10-02T20:19:58Z

Description

Brief description of what this PR is about

If this change is a backward incompatible change, why must this change be made?
Interesting edge cases to note here

tests/integration/llm/prepare.py

davidthomas426 · 2024-10-03T00:31:47Z

engines/python/setup/djl_python/properties_manager/vllm_rb_properties.py

@@ -59,6 +59,22 @@ class VllmRbProperties(Properties):
    enable_prefix_caching: Optional[bool] = False
    disable_sliding_window: Optional[bool] = False
    limit_mm_per_prompt: Optional[Mapping[str, int]] = None
+    use_v2_block_manager: bool = False


FYI: vllm-project/vllm#8678 (not merged yet, but changes this default to True in vLLM).

[vllm] Support speculative decoding in vllm rolling batch

d1ec11f

xyang16 requested review from zachgk and a team as code owners October 2, 2024 20:19

sindhuvahinis approved these changes Oct 2, 2024

View reviewed changes

tosterberg approved these changes Oct 2, 2024

View reviewed changes

tests/integration/llm/prepare.py Outdated Show resolved Hide resolved

tests/integration/llm/prepare.py Show resolved Hide resolved

Update

0788e72

xyang16 merged commit 0631414 into deepjavalibrary:master Oct 2, 2024
9 checks passed

davidthomas426 reviewed Oct 3, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[vllm] Support speculative decoding in vllm rolling batch #2413

[vllm] Support speculative decoding in vllm rolling batch #2413

xyang16 commented Oct 2, 2024

davidthomas426 Oct 3, 2024

[vllm] Support speculative decoding in vllm rolling batch #2413

[vllm] Support speculative decoding in vllm rolling batch #2413

Conversation

xyang16 commented Oct 2, 2024

Description

davidthomas426 Oct 3, 2024

Choose a reason for hiding this comment