Usefulness of conditioning latents #84

planetrocke · 2022-05-31T17:23:29Z

planetrocke
May 31, 2022

I am looking at ways to add consistency and reduce system resources/inference times. I created a latent file thinking that may speed things up since it looks like tortoise is creating this tuple by default, a step I theoretically would be skipping.

This didn't seem to speed things up, and the results were the same (which I would expect).

Is there something else I can do to reduce the system resources and inference times?

Answered by neonbjb

May 31, 2022

This time is predominantly spent loading the model files (~4GB of them) from disk. If you use tortoise programmatically, this should only happen when you first instantiate TextToSpeech. After that, this loading time would be skipped for each call to tts().

Generating the conditioning latents is relatively fast. It should only take a few milliseconds once everything is in memory. If that's not the case, there's a bug here.

View full answer

neonbjb · 2022-05-31T18:12:40Z

neonbjb
May 31, 2022
Maintainer

The best advice I can offer that doesn't require a substantial amount of engineering effort is to tune the generation parameters with the goal of reducing the number of autoregressive samples generated and the number of diffusion passes made. All of the configuration options were tuned with quality in mind over speed, but there is likely some gains to be had if the priorities were swapped, and probably at not a huge cost in overall quality.

1 reply

planetrocke May 31, 2022
Author

Thank you for the advice, I am working on slimming it down via the settings. I do have a question about what happens during the approximately 20 seconds prior to "Generating autoregressive samples...". Is this how long it takes to load the conditioning pieces I see in the API prior to this message? If so, is there an alternative way to prepare them (cache, daemon, etc)?

neonbjb · 2022-05-31T21:18:36Z

neonbjb
May 31, 2022
Maintainer

This time is predominantly spent loading the model files (~4GB of them) from disk. If you use tortoise programmatically, this should only happen when you first instantiate TextToSpeech. After that, this loading time would be skipped for each call to tts().

Generating the conditioning latents is relatively fast. It should only take a few milliseconds once everything is in memory. If that's not the case, there's a bug here.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Usefulness of conditioning latents #84

{{title}}

Replies: 2 comments 1 reply

{{title}}

{{title}}

{{title}}

Select a reply

Usefulness of conditioning latents #84

planetrocke May 31, 2022

Replies: 2 comments · 1 reply

neonbjb May 31, 2022 Maintainer

planetrocke May 31, 2022 Author

neonbjb May 31, 2022 Maintainer

planetrocke
May 31, 2022

Replies: 2 comments 1 reply

neonbjb
May 31, 2022
Maintainer

planetrocke May 31, 2022
Author

neonbjb
May 31, 2022
Maintainer