Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prep for Voice Steering feature #141

Merged
merged 6 commits into from
Oct 14, 2024

Conversation

apresence
Copy link
Contributor

Credits:

  1. ylacombe
  1. stg2015

Credits:

1. ylacombe
- Add input_values to DACModel
- dac_wrapper/modeling_dac.py
- huggingface#110 (comment)

2. stg2015
- Delay mask adjustment for input_values
- modeling_parler_tts.py
- huggingface#81 (comment)
@Guppy16
Copy link

Guppy16 commented Sep 27, 2024

Heya, I am the author of #110 (comment).

It seems like we have similar results, but our implementations are slightly different. It looks like you are following #81 (comment) to adjust the delay patter mask after the tokens have been generated from the Decoder. Where as my implementation modifies the mask before the tokens are injected into the Decoder.

Are our approaches equivalent, or is there a difference?

@apresence
Copy link
Contributor Author

apresence commented Sep 27, 2024 via email

@apresence
Copy link
Contributor Author

apresence commented Sep 27, 2024

Heya, I am the author of #110 (comment).

It seems like we have similar results, but our implementations are slightly different. It looks like you are following #81 (comment) to adjust the delay patter mask after the tokens have been generated from the Decoder. Where as my implementation modifies the mask before the tokens are injected into the Decoder.

Are our approaches equivalent, or is there a difference?

I modified my change with your suggestion, and got exactly the same output. Specifically, I used the same inputs with and without your suggested change and compared a hash of the WAV file outputs and they are the same.

I've adjusted my PR based on your suggestion.

Thanks!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there, thanks for opening the PR!
Happy to be corrected here, but the modif proposed here seems incorrect. As highlighted in #110, I believe the modeling code is already correct and can be used even you pass input_values.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heya @ylacombe I don't think the modelling code is correct in this case. I have an example in #110 with and without the changes.

#110 (comment)

I'm not sure why your testing shows that it's working. Would you be able to double check please?

Copy link
Collaborator

@ylacombe ylacombe Oct 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you provide a code snippet that reproduce your original result?

In my comment, I don't have the issue you've shared (except with the adding main_input_name = "input_values" as a class attribute of DACModel which indeed needs to be added) : #110 (comment)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. Here's the code snippet: #139 (comment)

To clarify, if you try this code snippet on the current version of ParlerTTS, it will give the "crackling" like sound. If you try to do this with either this MR or #110 , then this crackling noise is fixed.

@ylacombe
Copy link
Collaborator

Hey @apresence, sorry for the long delay ! Turns out you were right. Merging to fix this! Thanks for the work

@ajkessel
Copy link

Does this PR enable inference on longer text, so the voice doesn't change with each chunk? If so, is there any documentation of how to actually do that? Or am I misunderstanding the purpose of this patch?

@apresence
Copy link
Contributor Author

apresence commented Nov 17, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants