You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
At this line, a delay pattern mask is generated and applied to the initial audio IDs. Then, at this line, a mask is also generated to revert the delay on the output tokens. However, it generates the mask based on input_ids which already includes the delay pattern.
The text was updated successfully, but these errors were encountered:
Yes, I understand the intention. However, the current code does not regenerate the mask in all cases. If the initial input_ids only contain a vector of BOS tokens, it works perfectly. But if you try to generate a continuation of some audio, the input_ids will be modified here line, and the mask generated here line will be incorrect.
At this line, a delay pattern mask is generated and applied to the initial audio IDs. Then, at this line, a mask is also generated to revert the delay on the output tokens. However, it generates the mask based on
input_ids
which already includes the delay pattern.The text was updated successfully, but these errors were encountered: