You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a Sequence Classification task using Qwen, which has num_labels = 5, and I want to use gather_generation_logits to output the logits on the generated token, however, there's an issue: the number of classification labels is 5, but the returned generation_logits has a size of [vocab size]. Beyond the first 5 positions, all logits are zero.
Additionally, when the batch size > 1, only the first sample's first 5 logits are correct, while all logits for the other samples are zero. It seems that the executor class in TensorRT-LLM's implementation of gather_generation_logits does not handle cases where the label count differs from the vocab size properly, potentially causing invalid CUDA memory access and resulting in zeros.
Actually, it seems that the implementation of this part in the executor is not open-sourced and is currently packaged as a .a file. Could you please share some details about how this part is currently implemented? Thank you!
The text was updated successfully, but these errors were encountered:
I have a Sequence Classification task using Qwen, which has num_labels = 5, and I want to use
gather_generation_logits
to output the logits on the generated token, however, there's an issue: the number of classification labels is 5, but the returned generation_logits has a size of [vocab size]. Beyond the first 5 positions, all logits are zero.Additionally, when the batch size > 1, only the first sample's first 5 logits are correct, while all logits for the other samples are zero. It seems that the executor class in TensorRT-LLM's implementation of gather_generation_logits does not handle cases where the label count differs from the vocab size properly, potentially causing invalid CUDA memory access and resulting in zeros.
Actually, it seems that the implementation of this part in the executor is not open-sourced and is currently packaged as a .a file. Could you please share some details about how this part is currently implemented? Thank you!
The text was updated successfully, but these errors were encountered: