Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Doc] V1 user guide #13991

Open
wants to merge 23 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 10 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
53 changes: 53 additions & 0 deletions docs/source/getting_started/v1_user_guide.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
# vLLM V1 User Guide

## Why vLLM V1?
Previous blog post [vLLM V1: A Major Upgrade to vLLM's Core Architecture](https://blog.vllm.ai/2025/01/27/V1-alpha-release.html)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This link is invalid, showing "404", please check~

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean: https://blog.vllm.ai/2025/01/27/v1-alpha-release.html ? please modify V1 to v1 in the link.


## Semantic changes and deprecated features

### Logprobs
- vLLM V1 now supports both sample logprobs and prompt logprobs, as introduced in this [PR](https://github.com/vllm-project/vllm/pull/9880).
- **Current Limitations**:
- V1 prompt logprobs do not support prefix caching.
- V1 logprobs are computed before logits post-processing, so penalty
adjustments and temperature scaling are not applied.
- The team is actively working on implementing logprobs that include post-sampling adjustments.

### The following features has been deprecated in V1:

#### Deprecated sampling features
- best_of
- logits_processors
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • "Per request logits processors are not supported"
  • Nick is working on supporting these globally in the server

Copy link
Contributor Author

@JenZhao JenZhao Mar 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed in the updates from @ywang96

thank you Roger

- beam_search
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe beam search is supported


#### Deprecated KV Cache features
- KV Cache swapping
- KV Cache offloading

## Unsupported features

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to my comment above for the first section: adding some context here will be helpful.

- **LoRA**: LoRA works for V1 on the main branch, but its performance is inferior to that
of V0.
The team is actively working on improving the performance [PR](https://github.com/vllm-project/vllm/pull/13096).

- **Spec Decode other than ngram**: currently, only ngram spec decode is supported in V1
after this [PR](https://github.com/vllm-project/vllm/pull/12193).

- **Quantization**: For V1, when the CUDA graph is enabled, it defaults to the
piecewise CUDA graph introduced in this [PR](https://github.com/vllm-project/vllm/pull/10058) ; consequently, FP8 and other quantizations are not supported.

- **FP8 KV Cache**: FP8 KV Cache is not yet supported in V1.

## Unsupported models
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Mamba not supported yet
  • We plan to support these eventually


All model with `SupportsV0Only` tag in the model definition is not supported by V1.

- **Pooling Models**: Pooling models are not supported in V1 yet.
- **Encoder-Decoder**: vLLM V1 is currently limited to decoder-only Transformers.
Please check out our
[documentation](https://docs.vllm.ai/en/latest/models/supported_models.html) for a
more detailed list of the supported models. Encoder-decoder models support is not
happending soon.


## FAQ
2 changes: 2 additions & 0 deletions docs/source/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,8 @@ getting_started/quickstart
getting_started/examples/examples_index
getting_started/troubleshooting
getting_started/faq
getting_started/v1_user_guide

:::

% What does vLLM support?
Expand Down
Loading