[BUGFIX] Skip tokenization support for throughput benchmark #12712

maleksan85 · 2025-02-03T21:48:03Z

Fixing support for vLLM mode to run without tokenizer in throughput benchmark.

Repeating accidently corrupted: #12489

github-actions · 2025-02-03T21:48:15Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

maleksan85 · 2025-02-03T21:49:14Z

cc @comaniac copy of #12489 not yet tried on v1, as there was issues with v1 last week.

comaniac · 2025-02-03T22:50:30Z

Thanks. Will approve if it works for v1.

maleksan85 · 2025-02-28T23:36:34Z

Thanks. Will approve if it works for v1.

sorry for delay, just got spare time to test it and V1 working :) So both works well and show throughput improvement.

tlrmchlsmth

On V0, I see a lot of warning spam like the following when running with --skip_tokenizer_init. Have you noticed this as well?

WARNING 03-04 20:39:37 [preprocess.py:59] Using None for EOS token id because tokenizer is not initialized

benchmarks/benchmark_throughput.py

tlrmchlsmth

Please merge in latest main

mergify · 2025-03-04T20:46:41Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @maleksan85.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: Aleksandr Malyshev <[email protected]>

vllm/engine/arg_utils.py

Signed-off-by: Aleksandr Malyshev <[email protected]>

njhill · 2025-03-06T18:06:15Z

I don't think we support the skip_tokenizer_init option yet in V1 (we should make sure there's a check on the V1 path that raises an appropriate error).

@maleksan85 WDYT about handling this in the benchmarks without using that option - in particular having explicit skip_tokenization and skip_detokenization benchmark options where the former controls whether token_ids are used instead of text prompts (per your change in this PR), and the latter is handled by #11697?

This should work in V1 now that #14224 is merged.

maleksan85 · 2025-03-06T19:29:05Z

I don't think we support the skip_tokenizer_init option yet in V1 (we should make sure there's a check on the V1 path that raises an appropriate error).

@maleksan85 WDYT about handling this in the benchmarks without using that option - in particular having explicit skip_tokenization and skip_detokenization benchmark options where the former controls whether token_ids are used instead of text prompts (per your change in this PR), and the latter is handled by #11697?

This should work in V1 now that #14224 is merged.

This is a BUGFIX PR that restores functionality to exclude tokenize\detokenize from model inference. Tried on V1, haven't seen any errors. If you want extra flag, that vllm community was against of, when I created this PR, sure. However personally I don't see a reason for new flag(s).

Alexei-V-Ivanov-AMD · 2025-03-06T21:30:10Z

/ready

maleksan85 · 2025-03-06T22:05:56Z

@comaniac would you be able to help to land this PR?

comaniac

Otherwise LGTM

benchmarks/benchmark_throughput.py

maleksan85 · 2025-03-06T22:54:15Z

Otherwise LGTM

great, please mark it ready then if no other concerns.

njhill

I don't think we support the skip_tokenizer_init option yet in V1 (we should make sure there's a check on the V1 path that raises an appropriate error).
@maleksan85 WDYT about handling this in the benchmarks without using that option - in particular having explicit skip_tokenization and skip_detokenization benchmark options where the former controls whether token_ids are used instead of text prompts (per your change in this PR), and the latter is handled by #11697?
This should work in V1 now that #14224 is merged.

This is a BUGFIX PR that restores functionality to exclude tokenize\detokenize from model inference. Tried on V1, haven't seen any errors. If you want extra flag, that vllm community was against of, when I created this PR, sure. However personally I don't see a reason for new flag(s).

@maleksan85 apologies, LGTM too. Would it make sense to apply a similar change to the other benchmark scripts benchmark_latency.py, benchmark_prefix_caching.py, benchmark_prioritization.py?

maleksan85 · 2025-03-07T17:25:02Z

@maleksan85 apologies, LGTM too. Would it make sense to apply a similar change to the other benchmark scripts benchmark_latency.py, benchmark_prefix_caching.py, benchmark_prioritization.py?

as to what I understand benrmark_latency.py only sends ids instead of tokens. So like inherently supports skip_tokenizer_init, I mean the change I did for benchmark throughput.

vllm/benchmarks/benchmark_latency.py

Line 58 in d0feea3

dummy_prompt_token_ids = np.random.randint(10000,

For the rest of benchmarks it is up to those functionalities developers.

adding support for skip tokenizer benchmarking

576bc6a

Signed-off-by: root <[email protected]>

maleksan85 changed the title ~~adding support for skip tokenizer benchmarking~~ [BUGFIX] Skip tokenization support for throughtput benchmark Feb 3, 2025

comaniac self-assigned this Feb 3, 2025

tlrmchlsmth reviewed Mar 4, 2025

View reviewed changes

benchmarks/benchmark_throughput.py Outdated Show resolved Hide resolved

benchmarks/benchmark_throughput.py Outdated Show resolved Hide resolved

benchmarks/benchmark_throughput.py Show resolved Hide resolved

tlrmchlsmth reviewed Mar 4, 2025

View reviewed changes

mergify bot added the needs-rebase label Mar 4, 2025

Aleksandr Malyshev added 2 commits March 4, 2025 22:49

acted up on comments in PR

18c8ef6

Signed-off-by: Aleksandr Malyshev <[email protected]>

merge with main

6d2f700

Signed-off-by: Aleksandr Malyshev <[email protected]>

mergify bot removed the needs-rebase label Mar 4, 2025

tlrmchlsmth reviewed Mar 5, 2025

View reviewed changes

vllm/engine/arg_utils.py Outdated Show resolved Hide resolved

tlrmchlsmth changed the title ~~[BUGFIX] Skip tokenization support for throughtput benchmark~~ [BUGFIX] Skip tokenization support for throughput benchmark Mar 5, 2025

tlrmchlsmth approved these changes Mar 5, 2025

View reviewed changes

description change

5e16d39

Signed-off-by: Aleksandr Malyshev <[email protected]>

comaniac reviewed Mar 6, 2025

View reviewed changes

benchmarks/benchmark_throughput.py Show resolved Hide resolved

comaniac added the ready ONLY add when PR is ready to merge/full CI is needed label Mar 6, 2025

comaniac approved these changes Mar 6, 2025

View reviewed changes

comaniac enabled auto-merge (squash) March 6, 2025 23:18

comaniac added the force-merge label Mar 7, 2025

vllm-bot merged commit 0ca3b8e into vllm-project:main Mar 7, 2025
61 of 62 checks passed

njhill reviewed Mar 7, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUGFIX] Skip tokenization support for throughput benchmark #12712

[BUGFIX] Skip tokenization support for throughput benchmark #12712

maleksan85 commented Feb 3, 2025 •

edited by github-actions bot

Loading

github-actions bot commented Feb 3, 2025

maleksan85 commented Feb 3, 2025 •

edited

Loading

comaniac commented Feb 3, 2025

maleksan85 commented Feb 28, 2025 •

edited

Loading

tlrmchlsmth left a comment

tlrmchlsmth left a comment

mergify bot commented Mar 4, 2025

njhill commented Mar 6, 2025 •

edited

Loading

maleksan85 commented Mar 6, 2025 •

edited

Loading

Alexei-V-Ivanov-AMD commented Mar 6, 2025

maleksan85 commented Mar 6, 2025

comaniac left a comment

maleksan85 commented Mar 6, 2025

njhill left a comment

maleksan85 commented Mar 7, 2025

[BUGFIX] Skip tokenization support for throughput benchmark #12712

[BUGFIX] Skip tokenization support for throughput benchmark #12712

Conversation

maleksan85 commented Feb 3, 2025 • edited by github-actions bot Loading

github-actions bot commented Feb 3, 2025

maleksan85 commented Feb 3, 2025 • edited Loading

comaniac commented Feb 3, 2025

maleksan85 commented Feb 28, 2025 • edited Loading

tlrmchlsmth left a comment

Choose a reason for hiding this comment

tlrmchlsmth left a comment

Choose a reason for hiding this comment

mergify bot commented Mar 4, 2025

njhill commented Mar 6, 2025 • edited Loading

maleksan85 commented Mar 6, 2025 • edited Loading

Alexei-V-Ivanov-AMD commented Mar 6, 2025

maleksan85 commented Mar 6, 2025

comaniac left a comment

Choose a reason for hiding this comment

maleksan85 commented Mar 6, 2025

njhill left a comment

Choose a reason for hiding this comment

maleksan85 commented Mar 7, 2025

maleksan85 commented Feb 3, 2025 •

edited by github-actions bot

Loading

maleksan85 commented Feb 3, 2025 •

edited

Loading

maleksan85 commented Feb 28, 2025 •

edited

Loading

njhill commented Mar 6, 2025 •

edited

Loading

maleksan85 commented Mar 6, 2025 •

edited

Loading