-
-
Notifications
You must be signed in to change notification settings - Fork 6.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUGFIX] Skip tokenization support for throughput benchmark #12712
[BUGFIX] Skip tokenization support for throughput benchmark #12712
Conversation
Signed-off-by: root <[email protected]>
👋 Hi! Thank you for contributing to the vLLM project. Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can do one of these:
🚀 |
Thanks. Will approve if it works for v1. |
sorry for delay, just got spare time to test it and V1 working :) So both works well and show throughput improvement. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On V0, I see a lot of warning spam like the following when running with --skip_tokenizer_init
. Have you noticed this as well?
WARNING 03-04 20:39:37 [preprocess.py:59] Using None for EOS token id because tokenizer is not initialized
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please merge in latest main
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: Aleksandr Malyshev <[email protected]>
Signed-off-by: Aleksandr Malyshev <[email protected]>
Signed-off-by: Aleksandr Malyshev <[email protected]>
I don't think we support the @maleksan85 WDYT about handling this in the benchmarks without using that option - in particular having explicit This should work in V1 now that #14224 is merged. |
This is a BUGFIX PR that restores functionality to exclude tokenize\detokenize from model inference. Tried on V1, haven't seen any errors. If you want extra flag, that vllm community was against of, when I created this PR, sure. However personally I don't see a reason for new flag(s). |
/ready |
@comaniac would you be able to help to land this PR? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise LGTM
great, please mark it ready then if no other concerns. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we support the
skip_tokenizer_init
option yet in V1 (we should make sure there's a check on the V1 path that raises an appropriate error).
@maleksan85 WDYT about handling this in the benchmarks without using that option - in particular having explicitskip_tokenization
andskip_detokenization
benchmark options where the former controls whether token_ids are used instead of text prompts (per your change in this PR), and the latter is handled by #11697?
This should work in V1 now that #14224 is merged.This is a BUGFIX PR that restores functionality to exclude tokenize\detokenize from model inference. Tried on V1, haven't seen any errors. If you want extra flag, that vllm community was against of, when I created this PR, sure. However personally I don't see a reason for new flag(s).
@maleksan85 apologies, LGTM too. Would it make sense to apply a similar change to the other benchmark scripts benchmark_latency.py
, benchmark_prefix_caching.py
, benchmark_prioritization.py
?
as to what I understand benrmark_latency.py only sends ids instead of tokens. So like inherently supports skip_tokenizer_init, I mean the change I did for benchmark throughput. vllm/benchmarks/benchmark_latency.py Line 58 in d0feea3
For the rest of benchmarks it is up to those functionalities developers. |
Fixing support for vLLM mode to run without tokenizer in throughput benchmark.
Repeating accidently corrupted: #12489