-
-
Notifications
You must be signed in to change notification settings - Fork 6.2k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
1 changed file
with
27 additions
and
19 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,44 +1,52 @@ | ||
# vLLM V1 User Guide | ||
|
||
## Why vLLM v1? | ||
Previous blog post [vLLM V1: A Major Upgrade to vLLM's Core Architecture](https://blog.vllm.ai/2025/01/27/v1-alpha-release.html) | ||
## Why vLLM V1? | ||
Previous blog post [vLLM V1: A Major Upgrade to vLLM's Core Architecture](https://blog.vllm.ai/2025/01/27/V1-alpha-release.html) | ||
|
||
## Semantic changes and deprecated features | ||
|
||
### Logprobs | ||
- vLLM v1 now supports both sample logprobs and prompt logprobs, as introduced in this [PR](https://github.com/vllm-project/vllm/pull/9880). | ||
- vLLM V1 now supports both sample logprobs and prompt logprobs, as introduced in this [PR](https://github.com/vllm-project/vllm/pull/9880). | ||
- **Current Limitations**: | ||
- v1 prompt logprobs do not support prefix caching. | ||
- v1 logprobs are computed before logits post-processing, so penalty | ||
- V1 prompt logprobs do not support prefix caching. | ||
- V1 logprobs are computed before logits post-processing, so penalty | ||
adjustments and temperature scaling are not applied. | ||
- The team is actively working on implementing logprobs that include post-sampling adjustments. | ||
|
||
### Encoder-Decoder | ||
- vLLM v1 is currently limited to decoder-only Transformers. Please check out our | ||
[documentation](https://docs.vllm.ai/en/latest/models/supported_models.html) for a | ||
more detailed list of the supported models. Encoder-decoder models support is not | ||
happending soon. | ||
### The following features has been deprecated in V1: | ||
|
||
### List of features that are deprecated in v1 | ||
#### Deprecated sampling features | ||
- best_of | ||
- logits_processors | ||
- beam_search | ||
|
||
#### Deprecated KV Cache | ||
- KV Cache swapping | ||
- KV Cache offloading | ||
- FP8 KV Cache | ||
|
||
## Unsupported features | ||
|
||
### LoRA | ||
- LoRA works for V1 on the main branch, but its performance is inferior to that of V0. | ||
- **LoRA**: LoRA works for V1 on the main branch, but its performance is inferior to that | ||
of V0. | ||
The team is actively working on improving the performance [PR](https://github.com/vllm-project/vllm/pull/13096). | ||
|
||
### Spec Decode other than ngram | ||
- Currently, only ngram spec decode is supported in V1 after this [PR](https://github.com/vllm-project/vllm/pull/12193). | ||
- **Spec Decode other than ngram**: currently, only ngram spec decode is supported in V1 | ||
after this [PR](https://github.com/vllm-project/vllm/pull/12193). | ||
|
||
### KV Cache Swapping & Offloading & FP8 KV Cache | ||
- vLLM v1 does not support KV Cache swapping, offloading, and FP8 KV Cache yet. The | ||
team is working actively on it. | ||
- **Quantization**: For V1, when the CUDA graph is enabled, it defaults to the | ||
piecewise CUDA graph introduced in this [PR](https://github.com/vllm-project/vllm/pull/10058) ; consequently, FP8 and other quantizations are not supported. | ||
|
||
## Unsupported models | ||
|
||
## Unsupported Models | ||
All model with `SupportsV0Only` tag in the model definition is not supported by V1. | ||
|
||
- **Pooling Models**: Pooling models are not supported in V1 yet. | ||
- **Encoder-Decoder**: vLLM V1 is currently limited to decoder-only Transformers. | ||
Please check out our | ||
[documentation](https://docs.vllm.ai/en/latest/models/supported_models.html) for a | ||
more detailed list of the supported models. Encoder-decoder models support is not | ||
happending soon. | ||
|
||
|
||
## FAQ |