-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
planning: fast open source tts for ichigo #94
Comments
Why the sample on f5--ts work, it seems everything else is pretty bad |
Tested on TTS Arena and added to Drive: Commercial Non-Commercial unknown license |
After testing these models, it seems F5-TTS is the only open-source TTS that can get the pronunciation of both "Ichigo" and reading out of the acronym "AI" correct. The commercial ones have no problem with this of course. The next question is then whether F5-TTS inference is going to be fast enough. Will update after some testing. |
f5-tts vram usage is quite concerning, can you make a direct comparison with fishspeech |
I used nvtop to monitor F5-tts on my machine. Based on their provided inference script, its requiring 2.3GB of GPU memory during inference. I tested this on a 214 word long generation, which the inference script converts into 8 batches for generation. The maximum memory stays constant at 2.3GB across the batches. For the 214 word generation: Audio generated: 87s |
15gb is nut |
Sorry before I edited I said 15GB, but was an error on my part. Someone else's job didn't clear the GPU and it was sitting on 15GB. |
We need to replace the current fishspeech with better TTS model.
WIP Shortlist of Possible candidates:
Test sentence:
I'm Ichigo, a local AI created by Homebrew Research. I'm here to help answer your questions and make your life easier.
Samples
https://drive.google.com/drive/folders/1FbR5H7rqirHDgxbjxO8Zwhxsj5y4t_mq?usp=sharing
The text was updated successfully, but these errors were encountered: