-
Notifications
You must be signed in to change notification settings - Fork 101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support Text to Speech #209
Comments
+1 Would also love to see the support for Coqui TTS. |
It would be great to run Bark in Elixir, also recently this TTS model brought a lot of attention https://github.com/collabora/WhisperSpeech |
I hate to reiterate what's already been said but TTS in Bumblebee using Bark would be super valuable. Any chance of supporting it? Hugging face: https://huggingface.co/suno/bark |
Pull requests are always welcome. Starting with one of the models in Hugging Face Transformers is probably the easiest way to get started: https://huggingface.co/docs/transformers/en/tasks/text-to-speech |
Just adding this as an interesting model to support too https://huggingface.co/coqui/XTTS-v2 |
I tried to port Bark and later on WhisperSpeech, they use multiple models to convert text to semantics, semantics to audio and encode... anyway there are more promising models recently released https://huggingface.co/parler-tts/parler_tts_mini_v0.1 or |
@bartekupartek, do you have your implementation open? I'm trying to do the same I've read the docs but not sure where to start. |
@michelson not yet but working on it, this models aren't using standard layers or if at all they are in pickle format, I needed to move back to understand simpler models with axon first |
I'm currently playing around Tacotron 2 text-to-speech and since it's simplest TTS I've found I'm trying to reproduce it in Elixir, I used |
Correct. We would need to implement them in Elixir. Maybe @polvalente knows of an implementation that could be ported, otherwise we need to look if there are any Jax implementations. If not, maybe it needs to be a separate library we invoke. |
There are many kinds of vocoders. I think the best way to approach this would be to choose a specific model we want to support and work towards porting the one it uses. |
I was thinking it might be one of torchaudio vocoders like Griffin-Lim(outputs sounds robotic) or WaveRNN(most likely this) or Nvidia Waveglow to turn mel spectograms into audio, but I just read trough VALL-E paper Bark is based on:
It would be fun to have Tacotron 2 working end to end or hear how mel spectrograms sounds but it looks like it doesn't make sense for any recent models mentioned above that are using facebook/encodec to turn outputs into audio codes directly 🙇♂️ |
+1 |
Hello!
As Speech to Text models such as Whisper are added having access to some of the impressive AI Text to Speech models would be a nice way to close the loop!
My current suggestion for a model to support would be bark.
The text was updated successfully, but these errors were encountered: