Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bugfix][V1] Fix allowed_token_ids for v1 Sampler #14169

Merged
merged 4 commits into from
Mar 5, 2025

Conversation

houseroad
Copy link
Contributor

@houseroad houseroad commented Mar 4, 2025

Revert the one and zero, so masked_fill_ is applied appropriately.

Tested with 'test_allowed_token_ids' in #14159

Additional test:

    batched_output = xx.generate(
    ¦   [PROMPT, PROMPT],
    ¦   [
    ¦   ¦   SamplingParams(allowed_token_ids=allowed_token_ids),
    ¦   ¦   SamplingParams(),
    ¦   ]
    )
    assert batched_output[1].outputs[0].token_ids[-1] != TOKEN_ID

Copy link

github-actions bot commented Mar 4, 2025

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

@mergify mergify bot added the v1 label Mar 4, 2025
@houseroad houseroad force-pushed the fix_v1_allowed_token_ids branch from 2439e4a to a5457a6 Compare March 4, 2025 03:30
@houseroad houseroad changed the title Fix allowed_token_ids for v1 Sampler [Bugfix] Fix allowed_token_ids for v1 Sampler Mar 4, 2025
@houseroad houseroad marked this pull request as ready for review March 4, 2025 03:34
Copy link
Member

@ywang96 ywang96 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for fixing this and I left two minor comments!

Comment on lines 95 to 96
if params.allowed_token_ids is not None and len(
params.allowed_token_ids) == 0:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be just if not params.allowed_token_ids since if params.allowed_token_ids is None it would have already returned above.

Comment on lines 303 to 313
self.allowed_token_ids_mask = torch.zeros(self.max_num_reqs,
self.vocab_size,
dtype=torch.bool,
device=self.device)
self.allowed_token_ids_mask_cpu_tensor = torch.zeros(
self.allowed_token_ids_mask = torch.ones(self.max_num_reqs,
self.vocab_size,
dtype=torch.bool,
device=self.device)
self.allowed_token_ids_mask_cpu_tensor = torch.ones(
self.max_num_reqs,
self.vocab_size,
dtype=torch.bool,
device="cpu")
self.allowed_token_ids_mask_cpu_tensor[req_index][
sampling_params.allowed_token_ids] = True
sampling_params.allowed_token_ids] = False
Copy link
Member

@ywang96 ywang96 Mar 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the updated logic here is a bit counter-intuitive to readers from a glance - can we add a NOTE here that all token ids with True mask will be filled with float("-inf") during sampling?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, added some comments

Copy link
Member

@njhill njhill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @houseroad, looks good.

Comment on lines 95 to 98
if params.allowed_token_ids is not None and len(
params.allowed_token_ids) == 0:
raise ValueError("allowed_token_ids is not None and empty!")
if not all(0 <= tid < self.model_config.get_vocab_size()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if params.allowed_token_ids is not None and len(
params.allowed_token_ids) == 0:
raise ValueError("allowed_token_ids is not None and empty!")
if not all(0 <= tid < self.model_config.get_vocab_size()
if not params.allowed_token_ids:
raise ValueError("allowed_token_ids cannot be empty")
vocab_size = self.model_config.get_vocab_size()
if not all(0 <= tid < vocab_size

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense, addressed the comments

@njhill njhill changed the title [Bugfix] Fix allowed_token_ids for v1 Sampler [Bugfix][V1] Fix allowed_token_ids for v1 Sampler Mar 4, 2025
Signed-off-by: Lu Fang <[email protected]>
Copy link
Member

@ywang96 ywang96 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check the discussion on slack regarding self.allowed_token_ids_mask - we need to take into account the situation where requests have different sampling params, which is fairly common in the online setting.

@njhill
Copy link
Member

njhill commented Mar 4, 2025

@liangfu yes thanks to @robertgshaw2-redhat for pointing out we only want to invert the mask for the applicable rows. So the tensors should still be initialized to zeros / reset to zeros when the request is removed, and when adding the requests, just fill the row to 1's before setting the allowed tokens to zero.

@@ -359,7 +362,7 @@ def remove_request(self, req_id: str) -> Optional[int]:
self.logit_bias[req_index] = None
self.has_allowed_token_ids.discard(req_id)
if self.allowed_token_ids_mask_cpu_tensor is not None:
self.allowed_token_ids_mask_cpu_tensor[req_index].fill_(False)
self.allowed_token_ids_mask_cpu_tensor[req_index].fill_(True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@houseroad this should also be reverted right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, added more comments to help understand

Copy link
Member

@njhill njhill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @houseroad, LGTM now

Copy link
Member

@ywang96 ywang96 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix!

@ywang96 ywang96 added the ready ONLY add when PR is ready to merge/full CI is needed label Mar 5, 2025
@ywang96 ywang96 enabled auto-merge (squash) March 5, 2025 07:26
@ywang96 ywang96 merged commit 8d6cd32 into vllm-project:main Mar 5, 2025
47 checks passed
njhill added a commit to njhill/vllm that referenced this pull request Mar 5, 2025
This was missed when merging vllm-project#14169 and vllm-project#14159

Signed-off-by: Nick Hill <[email protected]>
johnnynunez pushed a commit to johnnynunez/vllm that referenced this pull request Mar 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready ONLY add when PR is ready to merge/full CI is needed v1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants