Prep for Voice Steering feature #141

apresence · 2024-09-24T14:07:38Z

Credits:

ylacombe

Add input_values to DACModel
dac_wrapper/modeling_dac.py
Bugfix: Delay pattern mask is applied twice #110 (comment)

stg2015

Delay mask adjustment for input_values
modeling_parler_tts.py
Poor Audio Quality with input_values Input in Parler_TTS #81 (comment)

Credits: 1. ylacombe - Add input_values to DACModel - dac_wrapper/modeling_dac.py - huggingface#110 (comment) 2. stg2015 - Delay mask adjustment for input_values - modeling_parler_tts.py - huggingface#81 (comment)

parler_tts/modeling_parler_tts.py

Guppy16 · 2024-09-27T12:44:20Z

Heya, I am the author of #110 (comment).

It seems like we have similar results, but our implementations are slightly different. It looks like you are following #81 (comment) to adjust the delay patter mask after the tokens have been generated from the Decoder. Where as my implementation modifies the mask before the tokens are injected into the Decoder.

Are our approaches equivalent, or is there a difference?

apresence · 2024-09-27T15:40:29Z

Howdy! Thanks for the optimization. The power has been out here for over 6 hours here due to Helene. Just got the genny fired up. After I take care of getting the fam all situated I'll test your recommended change. Once I've got the code done, I can do a tutorial video if there is enough interest. Thanks!

…

________________________________ From: Akash Gupta ***@***.***> Sent: Friday, September 27, 2024 8:44:43 AM To: huggingface/parler-tts ***@***.***> Cc: apresence ***@***.***>; Author ***@***.***> Subject: Re: [huggingface/parler-tts] Prep for Voice Steering feature (PR #141) Heya, I am the author of #110 (comment)<#110 (comment)>. It seems like we have similar results, but our implementations are slightly different. It looks like you are following #81 (comment)<#81 (comment)> to adjust the delay patter mask after the tokens have been generated from the Decoder. Where as my implementation modifies the mask before the tokens are injected into the Decoder. Are our approaches equivalent, or is there a difference? — Reply to this email directly, view it on GitHub<#141 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AB5EQDTB753Z4VJ6AX7LGP3ZYVHLXAVCNFSM6AAAAABOYMZWIKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNZZGE4TOMZZGE>. You are receiving this because you authored the thread.Message ID: ***@***.***>

apresence · 2024-09-27T16:24:59Z

Heya, I am the author of #110 (comment).

It seems like we have similar results, but our implementations are slightly different. It looks like you are following #81 (comment) to adjust the delay patter mask after the tokens have been generated from the Decoder. Where as my implementation modifies the mask before the tokens are injected into the Decoder.

Are our approaches equivalent, or is there a difference?

I modified my change with your suggestion, and got exactly the same output. Specifically, I used the same inputs with and without your suggested change and compared a hash of the WAV file outputs and they are the same.

I've adjusted my PR based on your suggestion.

Thanks!

ylacombe · 2024-10-09T13:50:43Z

parler_tts/modeling_parler_tts.py

Hey there, thanks for opening the PR!
Happy to be corrected here, but the modif proposed here seems incorrect. As highlighted in #110, I believe the modeling code is already correct and can be used even you pass input_values.

Heya @ylacombe I don't think the modelling code is correct in this case. I have an example in #110 with and without the changes.

#110 (comment)

I'm not sure why your testing shows that it's working. Would you be able to double check please?

Could you provide a code snippet that reproduce your original result?

In my comment, I don't have the issue you've shared (except with the adding main_input_name = "input_values" as a class attribute of DACModel which indeed needs to be added) : #110 (comment)

Hmm. Here's the code snippet: #139 (comment)

To clarify, if you try this code snippet on the current version of ParlerTTS, it will give the "crackling" like sound. If you try to do this with either this MR or #110 , then this crackling noise is fixed.

parler_tts/modeling_parler_tts.py

ylacombe · 2024-10-14T12:07:27Z

Hey @apresence, sorry for the long delay ! Turns out you were right. Merging to fix this! Thanks for the work

ajkessel · 2024-11-15T21:33:00Z

Does this PR enable inference on longer text, so the voice doesn't change with each chunk? If so, is there any documentation of how to actually do that? Or am I misunderstanding the purpose of this patch?

apresence · 2024-11-17T16:32:14Z

Yes. And I am working on it, just got tied up with other things. Sent from my Verizon, Samsung Galaxy smartphone Get Outlook for Android<https://aka.ms/AAb9ysg>

…

________________________________ From: Adam J. Kessel ***@***.***> Sent: Friday, November 15, 2024 4:33:22 PM To: huggingface/parler-tts ***@***.***> Cc: apresence ***@***.***>; Mention ***@***.***> Subject: Re: [huggingface/parler-tts] Prep for Voice Steering feature (PR #141) Does this PR enable inference on longer text, so the voice doesn't change with each chunk? If so, is there any documentation of how to actually do that? Or am I misunderstanding the purpose of this patch? — Reply to this email directly, view it on GitHub<#141 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AB5EQDREKXLBOCV6MP3Q73T2AZSCFAVCNFSM6AAAAABOYMZWIKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINZZHE3DIMZRHE>. You are receiving this because you were mentioned.Message ID: ***@***.***>

Prep for Voice Steering feature

d3c7fdc

Credits: 1. ylacombe - Add input_values to DACModel - dac_wrapper/modeling_dac.py - huggingface#110 (comment) 2. stg2015 - Delay mask adjustment for input_values - modeling_parler_tts.py - huggingface#81 (comment)

apresence mentioned this pull request Sep 24, 2024

Voice Consistency Working Pretty Well -- Plus Zero-Shot Cloning! #139

Open

Prep for voice steering/cloning w/ fix for non-streaming generation

5c3519b

Guppy16 reviewed Sep 27, 2024

View reviewed changes

parler_tts/modeling_parler_tts.py Outdated Show resolved Hide resolved

Applied simpler input handling per Guppy16's suggestion

b8fd4cd

apresence added 2 commits September 27, 2024 12:50

Applied Guppy16's suggested optimization

c95da97

Applied Guppy17's suggested optimization for voice steering

91063f3

ylacombe reviewed Oct 9, 2024

View reviewed changes

ylacombe reviewed Oct 14, 2024

View reviewed changes

parler_tts/modeling_parler_tts.py Outdated Show resolved Hide resolved

Update parler_tts/modeling_parler_tts.py

3eb32d4

ylacombe merged commit 31816bd into huggingface:main Oct 14, 2024

This was referenced Oct 14, 2024

Bugfix: Delay pattern mask is applied twice #110

Closed

any list of all 36 voices? #95

Closed

Poor Audio Quality with input_values Input in Parler_TTS #81

Closed

Fix how delayed pattern mask is applied #147

Merged

ajkessel mentioned this pull request Nov 15, 2024

Long audio generation #136

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prep for Voice Steering feature #141

Prep for Voice Steering feature #141

apresence commented Sep 24, 2024

Guppy16 commented Sep 27, 2024

apresence commented Sep 27, 2024 via email

apresence commented Sep 27, 2024 •

edited

Loading

ylacombe Oct 9, 2024

Guppy16 Oct 9, 2024

ylacombe Oct 9, 2024 •

edited

Loading

Guppy16 Oct 9, 2024

ylacombe commented Oct 14, 2024

ajkessel commented Nov 15, 2024

apresence commented Nov 17, 2024 via email

Prep for Voice Steering feature #141

Prep for Voice Steering feature #141

Conversation

apresence commented Sep 24, 2024

Guppy16 commented Sep 27, 2024

apresence commented Sep 27, 2024 via email

apresence commented Sep 27, 2024 • edited Loading

ylacombe Oct 9, 2024

Choose a reason for hiding this comment

Guppy16 Oct 9, 2024

Choose a reason for hiding this comment

ylacombe Oct 9, 2024 • edited Loading

Choose a reason for hiding this comment

Guppy16 Oct 9, 2024

Choose a reason for hiding this comment

ylacombe commented Oct 14, 2024

ajkessel commented Nov 15, 2024

apresence commented Nov 17, 2024 via email

apresence commented Sep 27, 2024 •

edited

Loading

ylacombe Oct 9, 2024 •

edited

Loading