gather_generation_logits doesn't seem to work correctly for SequenceClassification models #2615

TriLoo · 2024-12-24T08:46:04Z

I have a Sequence Classification task using Qwen, which has num_labels = 5, and I want to use gather_generation_logits to output the logits on the generated token, however, there's an issue: the number of classification labels is 5, but the returned generation_logits has a size of [vocab size]. Beyond the first 5 positions, all logits are zero.

Additionally, when the batch size > 1, only the first sample's first 5 logits are correct, while all logits for the other samples are zero. It seems that the executor class in TensorRT-LLM's implementation of gather_generation_logits does not handle cases where the label count differs from the vocab size properly, potentially causing invalid CUDA memory access and resulting in zeros.

Actually, it seems that the implementation of this part in the executor is not open-sourced and is currently packaged as a .a file. Could you please share some details about how this part is currently implemented? Thank you!

The text was updated successfully, but these errors were encountered:

nv-guomingz added the Generic Runtime label Dec 24, 2024

github-actions bot added triaged Issue has been triaged by maintainers Investigating labels Dec 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gather_generation_logits doesn't seem to work correctly for SequenceClassification models #2615

gather_generation_logits doesn't seem to work correctly for SequenceClassification models #2615

TriLoo commented Dec 24, 2024

gather_generation_logits doesn't seem to work correctly for SequenceClassification models #2615

gather_generation_logits doesn't seem to work correctly for SequenceClassification models #2615

Comments

TriLoo commented Dec 24, 2024