Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug in generation code #85

Open
Squire-tomsk opened this issue Jul 10, 2024 · 3 comments
Open

Bug in generation code #85

Squire-tomsk opened this issue Jul 10, 2024 · 3 comments

Comments

@Squire-tomsk
Copy link

At this line, a delay pattern mask is generated and applied to the initial audio IDs. Then, at this line, a mask is also generated to revert the delay on the output tokens. However, it generates the mask based on input_ids which already includes the delay pattern.

@henry-tujia
Copy link

Here is just regenerating the mask once again to facilitate extracting the actual output of the model from the ids that already contain the mask.

@Squire-tomsk
Copy link
Author

Yes, I understand the intention. However, the current code does not regenerate the mask in all cases. If the initial input_ids only contain a vector of BOS tokens, it works perfectly. But if you try to generate a continuation of some audio, the input_ids will be modified here line, and the mask generated here line will be incorrect.

@Guppy16
Copy link

Guppy16 commented Aug 16, 2024

#110

Does this help?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants