Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gather_generation_logits doesn't seem to work correctly for SequenceClassification models #2615

Open
TriLoo opened this issue Dec 24, 2024 · 0 comments
Labels
Generic Runtime Investigating triaged Issue has been triaged by maintainers

Comments

@TriLoo
Copy link

TriLoo commented Dec 24, 2024

I have a Sequence Classification task using Qwen, which has num_labels = 5, and I want to use gather_generation_logits to output the logits on the generated token, however, there's an issue: the number of classification labels is 5, but the returned generation_logits has a size of [vocab size]. Beyond the first 5 positions, all logits are zero.

Additionally, when the batch size > 1, only the first sample's first 5 logits are correct, while all logits for the other samples are zero. It seems that the executor class in TensorRT-LLM's implementation of gather_generation_logits does not handle cases where the label count differs from the vocab size properly, potentially causing invalid CUDA memory access and resulting in zeros.

Actually, it seems that the implementation of this part in the executor is not open-sourced and is currently packaged as a .a file. Could you please share some details about how this part is currently implemented? Thank you!

@github-actions github-actions bot added triaged Issue has been triaged by maintainers Investigating labels Dec 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Generic Runtime Investigating triaged Issue has been triaged by maintainers
Projects
None yet
Development

No branches or pull requests

2 participants