Question regarding the speed of the generation #221
HyperUpscale
started this conversation in
General
Replies: 1 comment
-
It's a very different type of model. The goal of bark is not to create a good TTS but rather a model than can create completely arbitrary audio from scratch (could be high quality speech but also two people shouting at a soccer game with background music playing). It's a much harder tasks and needs a bigger and slower model. It happens to work well for TTS but on the speed quality trade-off if you need clean speech and nothing else it might not be the right tool |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
On a 3080 10GB one sentence text takes about 35 seconds and to be honest the quality is not 35 time better, but rather 2 than using Coqui TTS which takes 1 second for the same.
Another example of 3 senesces that takes 1,5 seconds with Coqui TTS, Bark TTS - 41 seconds.
Where comes the 40 times slower speed?
Beta Was this translation helpful? Give feedback.
All reactions