You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm quite pleased with the speed-up of read_fast.py over read.py. 25s audio generation took 27 minutes via read.py on preset fast and 17 minutes on preset ultra_fast (M1 Mac Studio) and took 6 minutes via read_fast.py. It doesn't appear that read_fast.py supports any preset options. However, I'm noticing a number of audio quality issues with the read_fast output. Most severely, the transcript consisted of two sentences and several words of the second sentence were dropped entirely. That won't do of course. What techniques can I employ to ensure there are no dropped words? Other than that, the quality is broadly worse in the read_fast.py case. For one thing, the prosody is much worse, especially the pauses (which I find problematic across the board with tortoise; I have yet to generate any audio I consider usable to produce an audiobook and I'm still only experimenting with short paragraphs, so this whole endeavor feels hopeless, but I'm hoping there are knobs I can fiddle with to improve it). So, what can be done to improve the overall audio quality of read_fast.py, especially its handling of pauses, and what can be done to improve prosody (especially pauses) of tortoise across all usages?
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
I'm quite pleased with the speed-up of read_fast.py over read.py. 25s audio generation took 27 minutes via read.py on preset fast and 17 minutes on preset ultra_fast (M1 Mac Studio) and took 6 minutes via read_fast.py. It doesn't appear that read_fast.py supports any preset options. However, I'm noticing a number of audio quality issues with the read_fast output. Most severely, the transcript consisted of two sentences and several words of the second sentence were dropped entirely. That won't do of course. What techniques can I employ to ensure there are no dropped words? Other than that, the quality is broadly worse in the read_fast.py case. For one thing, the prosody is much worse, especially the pauses (which I find problematic across the board with tortoise; I have yet to generate any audio I consider usable to produce an audiobook and I'm still only experimenting with short paragraphs, so this whole endeavor feels hopeless, but I'm hoping there are knobs I can fiddle with to improve it). So, what can be done to improve the overall audio quality of read_fast.py, especially its handling of pauses, and what can be done to improve prosody (especially pauses) of tortoise across all usages?
Thank you.
Beta Was this translation helpful? Give feedback.
All reactions