Add voices from Super Dialogue Audio Pack #425

n8bot · 2023-04-30T06:39:18Z

https://dillonbecker.itch.io/sdap

Attribution 4.0 International (CC BY 4.0) https://creativecommons.org/licenses/by/4.0/

These voices perform very well. They can be arranged in different ways to evoke certain emotions. Example of angry emotion with one of the voices.

n8bot · 2023-04-30T18:58:53Z

Attached is a sample of the output of some of the voices.

Newest samples:

Here is a demonstration of all the voices, as well as a failure of the "angry" voice to achieve consistency. I am puzzled, because earlier the same "angry" voice was quite consistent like the rest.

sdap12.zip

sdap12_angry_failure.zip

Old samples:

neonbjb

Hey - thanks for putting this together. Normally I wouldn't recommend more than 3 conditioning clips per voice. Do you find the model performs better with all these clips? Would you mind cutting it down a bit to keep the repo size in check?

n8bot · 2023-04-30T22:43:51Z

I did not do extensive testing with reduced numbers of these particular clips. They immediately worked so well, particularly the speech-only ones, that I left it as is with all the clips. The performance does not seem to be harmed by so many clips.

When I compare the output from these voices to other voices I have put together myself with fewer clips, and even some of the training voices, these voices perform much more consistently. The quality is high, for sure, but the consistency is what is nice.

For the PR, I could remove the _all voices, as the added non-speech vocalizations don't help with normal TTS use. However, they can sometimes be useful when crafting specific emotive voices from the clips.

Each variation of the voice (_all, _speech and the one _angry) has redundant copies of files. Rectifying this would reduce the added repo size significantly.

Maybe what I will do is put the "extra" audio clips in a separate folder alongside/within the voices folder, so users can use them if they wish.

Does the script scan subfolders for audio clips, too? I could have the extra clips in a subfolder with a text file with instructions.

The reason for the redundant license files was in case someone shared just a single voice, it would retain the license info.

Let me know if you have any thoughts I'll see about the subfolder thing.

https://dillonbecker.itch.io/sdap Attribution 4.0 International (CC BY 4.0) https://creativecommons.org/licenses/by/4.0/

n8bot · 2023-04-30T23:54:15Z

I removed all binary file redundancy.

Now there are only redundant markdown files with license and instructions for constructing subset voices.

n8bot · 2023-05-01T00:41:51Z

I just did a test with an "angry" subset voice, and the results are way less consistent. The voice completely changes.

So it does appear that the sheer volume of clips contributes to consistency.

n8bot · 2023-05-01T16:25:14Z

By the way, I have no problem if you decide not to include this in your repo. It requires no maintenance so I can easily patch this on top of any changes you make on my end. I just wanted to share it with anyone who might want it, because it's open source and I already did all the work I figured I might as well give you the choice to include it or not.

After doing a side-by-side test of these new voices, with the tortoise default training voices, I must concede that the training voices are overall much better. So, the purpose of these massive voices is not very clear.

n8bot · 2023-05-02T07:18:49Z

It's interesting, when I use the "angry" subset of clips for a voice, the consistency is much lower overall — random female voices and other voices pop up in clips. However, when the prompt is actually words that seem like something an angry person would say, the consistency is much greater and all candidate results are consistent.

G-force78 · 2023-06-04T14:23:19Z

It's interesting, when I use the "angry" subset of clips for a voice, the consistency is much lower overall — random female voices and other voices pop up in clips. However, when the prompt is actually words that seem like something an angry person would say, the consistency is much greater and all candidate results are consistent.

I noticed something on your samples that I got too, a strange yelp-groan at the end of the passage. I dont know if its misinterpreted emotional emphasis or what

This was referenced Apr 30, 2023

Voice / Speaker keeps changing mid sentence #423

Open

Stop Audio cut off at the end/is there a way to add time buffer? #416

Open

neonbjb requested changes Apr 30, 2023

View reviewed changes

Add voices from Super Dialogue Audio Pack

3c942ef

https://dillonbecker.itch.io/sdap Attribution 4.0 International (CC BY 4.0) https://creativecommons.org/licenses/by/4.0/

n8bot force-pushed the sdap_voices branch from c30444d to 3c942ef Compare April 30, 2023 23:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add voices from Super Dialogue Audio Pack #425

Add voices from Super Dialogue Audio Pack #425

n8bot commented Apr 30, 2023

n8bot commented Apr 30, 2023 •

edited

Loading

neonbjb left a comment

n8bot commented Apr 30, 2023 •

edited

Loading

n8bot commented Apr 30, 2023

n8bot commented May 1, 2023

n8bot commented May 1, 2023

n8bot commented May 2, 2023

G-force78 commented Jun 4, 2023

Add voices from Super Dialogue Audio Pack #425

Are you sure you want to change the base?

Add voices from Super Dialogue Audio Pack #425

Conversation

n8bot commented Apr 30, 2023

n8bot commented Apr 30, 2023 • edited Loading

Newest samples:

Old samples:

Example of a voice:

Example of a voice including non-speech vocalizations:

Example of a curated "angry" voice:

neonbjb left a comment

Choose a reason for hiding this comment

n8bot commented Apr 30, 2023 • edited Loading

n8bot commented Apr 30, 2023

n8bot commented May 1, 2023

n8bot commented May 1, 2023

n8bot commented May 2, 2023

G-force78 commented Jun 4, 2023

n8bot commented Apr 30, 2023 •

edited

Loading

n8bot commented Apr 30, 2023 •

edited

Loading