Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bugfix: Delay pattern mask is applied twice #110

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions parler_tts/dac_wrapper/modeling_dac.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@

class DACModel(PreTrainedModel):
config_class = DACConfig
main_input_name = "input_values"

def __init__(self, config):
super().__init__(config)
Expand Down
4 changes: 3 additions & 1 deletion parler_tts/modeling_parler_tts.py
Original file line number Diff line number Diff line change
Expand Up @@ -3387,7 +3387,8 @@ def generate(
)

# build the delay pattern mask for offsetting each codebook prediction by 1 (this behaviour is specific to Parler-TTS)
input_ids, decoder_delay_pattern_mask = self.decoder.build_delay_pattern_mask(
# but don't overwrite the input_ids tensor with the delay pattern mask. We perform that later
_, decoder_delay_pattern_mask = self.decoder.build_delay_pattern_mask(
Comment on lines +3390 to +3391
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As pointed out, this is a redundant operation that has no impact on the results!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I think this line does indeed change the results when using enrolled tokens. Perhaps your setup is working because it is slightly different as you've described below. I shall try this and get back to you

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so my testing shows that this fix is required to get the right audio when doing the enrolment. Here is an example audio file generated with and without the fix:
audio.zip

input_ids,
bos_token_id=generation_config._bos_token_tensor,
pad_token_id=generation_config._pad_token_tensor,
Expand Down Expand Up @@ -3442,6 +3443,7 @@ def generate(
generation_config=generation_config,
synced_gpus=synced_gpus,
streamer=streamer,
logits_warper=None,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should keep the logits_warper, I'm not sure why you removed it!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't remove it! Originally, logits_warper wasn't being passed in, so this part of the code was failing. I believe when doing greedy search, logits_warper=None should be set. Please could you double check this!

**model_kwargs,
)

Expand Down