Question regarding the speed of the generation #221

HyperUpscale · 2023-04-30T22:20:37Z

HyperUpscale
Apr 30, 2023

On a 3080 10GB one sentence text takes about 35 seconds and to be honest the quality is not 35 time better, but rather 2 than using Coqui TTS which takes 1 second for the same.

Another example of 3 senesces that takes 1,5 seconds with Coqui TTS, Bark TTS - 41 seconds.

Where comes the 40 times slower speed?

gkucsko · 2023-05-01T22:55:56Z

gkucsko
May 1, 2023
Maintainer

It's a very different type of model. The goal of bark is not to create a good TTS but rather a model than can create completely arbitrary audio from scratch (could be high quality speech but also two people shouting at a soccer game with background music playing). It's a much harder tasks and needs a bigger and slower model. It happens to work well for TTS but on the speed quality trade-off if you need clean speech and nothing else it might not be the right tool

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question regarding the speed of the generation #221

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Question regarding the speed of the generation #221

HyperUpscale Apr 30, 2023

Replies: 1 comment

gkucsko May 1, 2023 Maintainer

HyperUpscale
Apr 30, 2023

gkucsko
May 1, 2023
Maintainer