From ff71ca623d0d6c00e60a545adfa6db440415a672 Mon Sep 17 00:00:00 2001 From: Jennifer Zhao <7443418+JenZhao@users.noreply.github.com> Date: Thu, 27 Feb 2025 11:55:13 -0800 Subject: [PATCH] Update v1_user_guide.md --- docs/source/getting_started/v1_user_guide.md | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/docs/source/getting_started/v1_user_guide.md b/docs/source/getting_started/v1_user_guide.md index a3c08cb553a6d..f2f9ca7ebdbd3 100644 --- a/docs/source/getting_started/v1_user_guide.md +++ b/docs/source/getting_started/v1_user_guide.md @@ -20,10 +20,9 @@ Previous blog post [vLLM V1: A Major Upgrade to vLLM's Core Architecture](https: - logits_processors - beam_search -#### Deprecated KV Cache +#### Deprecated KV Cache features - KV Cache swapping - KV Cache offloading -- FP8 KV Cache ## Unsupported features @@ -37,6 +36,8 @@ Previous blog post [vLLM V1: A Major Upgrade to vLLM's Core Architecture](https: - **Quantization**: For V1, when the CUDA graph is enabled, it defaults to the piecewise CUDA graph introduced in this [PR](https://github.com/vllm-project/vllm/pull/10058) ; consequently, FP8 and other quantizations are not supported. +- **FP8 KV Cache**: FP8 KV Cache is not yet supported in V1. + ## Unsupported models All model with `SupportsV0Only` tag in the model definition is not supported by V1.