Usefulness of conditioning latents #84
-
I am looking at ways to add consistency and reduce system resources/inference times. I created a latent file thinking that may speed things up since it looks like tortoise is creating this tuple by default, a step I theoretically would be skipping. This didn't seem to speed things up, and the results were the same (which I would expect). Is there something else I can do to reduce the system resources and inference times? |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 1 reply
-
The best advice I can offer that doesn't require a substantial amount of engineering effort is to tune the generation parameters with the goal of reducing the number of autoregressive samples generated and the number of diffusion passes made. All of the configuration options were tuned with quality in mind over speed, but there is likely some gains to be had if the priorities were swapped, and probably at not a huge cost in overall quality. |
Beta Was this translation helpful? Give feedback.
-
This time is predominantly spent loading the model files (~4GB of them) from disk. If you use tortoise programmatically, this should only happen when you first instantiate TextToSpeech. After that, this loading time would be skipped for each call to Generating the conditioning latents is relatively fast. It should only take a few milliseconds once everything is in memory. If that's not the case, there's a bug here. |
Beta Was this translation helpful? Give feedback.
This time is predominantly spent loading the model files (~4GB of them) from disk. If you use tortoise programmatically, this should only happen when you first instantiate TextToSpeech. After that, this loading time would be skipped for each call to
tts()
.Generating the conditioning latents is relatively fast. It should only take a few milliseconds once everything is in memory. If that's not the case, there's a bug here.