Hey there, tech enthusiasts! 👋 Are you tired of boring, robotic text-to-speech voices? 😴 Well, we've got some exciting news for you! 🎉 We're introducing SpeechStylis AI, the cutting-edge technology that's revolutionizing the world of text-to-speech synthesis with Python! 🚀
Imagine being able to generate natural-sounding speech from text input, with a tone and style that matches your personality or brand. 💬 That's exactly what SpeechStylis AI does! It uses advanced machine learning algorithms to analyze a large dataset of human speech recordings, and then generates new speech samples that sound like they were recorded by a real person. 🤯
Ready to give it a try? SpeechStylis AI is now available as a Python library, so you can easily integrate it into your own projects. 🛠️ Whether you're building a virtual assistant, creating an audiobook, or developing an accessibility tool, SpeechStylis AI has everything you need to make your vision a reality. 🏡
Please use our dedicated channels for questions and discussion. Help is much more valuable if it's shared publicly so that more people can benefit from it.
Type | Platforms |
---|---|
🐛 Bug Reports | GitHub Issue Tracker |
🎁 Feature Requests & Ideas | GitHub Issue Tracker |
💻 Usage Questions | GitHub Discussions |
🗨️ General Discussion | GitHub Discussions or Discord |
-
Pretrained Models: Explore a wide range of pretrained models in over 1100 languages.
-
Versatile Tools: Utilize tools for training new models and fine-tuning existing ones in any language.
-
Dataset Analysis: Leverage utilities for dataset analysis and curation.
- Tacotron: paper
- Tacotron2: paper
- Glow-TTS: paper
- Speedy-Speech: paper
- Align-TTS: paper
- FastPitch: paper
👩💻SpeechStylis AI is tested on Ubuntu 18.04 with python >= 3.9, < 3.12.
Tested Platforms:
- Ubuntu
- Kali Linux
- Google Cloud
pip install TTS
If you are on Ubuntu (Debian) or Kali Linux, you can also run following commands for installation.
git clone https://github.com/haydenbanz/SpeechStylis.git
To use your prerecorded audio, locate the .py
file and find the section where the speaker's WAV file path is defined. Update the speaker_wav_path
variable with the path to your audio file. Below is an example:
# Original Code
speaker_wav_path = "/content/drive/MyDrive/audio.wav"
import torch
from TTS.api import TTS
# Get device
device = "cuda" if torch.cuda.is_available() else "cpu"
# List available
print(TTS().list_models())
# Init TTS
tts = TTS("tts_models/multilingual/multi-dataset/xtts_v2").to(device)
# Run TTS
# ❗ Since this model is multi-lingual voice cloning model, we must set the target speaker_wav and language
# Text to speech list of amplitude values as output
wav = tts.tts(text="Hello world!", speaker_wav="my/cloning/audio.wav", language="en")
# Text to speech to a file
tts.tts_to_file(text="Hello world!", speaker_wav="my/cloning/audio.wav", language="en", file_path="output.wav")
# Init TTS with the target model name
tts = TTS(model_name="tts_models/de/thorsten/tacotron2-DDC", progress_bar=False).to(device)
# Run TTS
tts.tts_to_file(text="Ich bin eine Testnachricht.", file_path=OUTPUT_PATH)
# Example voice cloning with YourTTS in English, French and Portuguese
tts = TTS(model_name="tts_models/multilingual/multi-dataset/your_tts", progress_bar=False).to(device)
tts.tts_to_file("This is voice cloning.", speaker_wav="my/cloning/audio.wav", language="en", file_path="output.wav")
tts.tts_to_file("C'est le clonage de la voix.", speaker_wav="my/cloning/audio.wav", language="fr-fr", file_path="output.wav")
tts.tts_to_file("Isso é clonagem de voz.", speaker_wav="my/cloning/audio.wav", language="pt-br", file_path="output.wav")
Converting the voice in source_wav
to the voice of target_wav
tts = TTS(model_name="voice_conversion_models/multilingual/vctk/freevc24", progress_bar=False).to("cuda")
tts.voice_conversion_to_file(source_wav="my/source.wav", target_wav="my/target.wav", file_path="output.wav")
This way, you can clone voices by using any model
tts = TTS("tts_models/de/thorsten/tacotron2-DDC")
tts.tts_with_vc_to_file(
"Wie sage ich auf Italienisch, dass ich dich liebe?",
speaker_wav="target/speaker.wav",
file_path="output.wav"
)
For Fairseq models, use the following name format: tts_models/<lang-iso_code>/fairseq/vits
.
You can find the language ISO codes here
and learn about the Fairseq models here.
# TTS with on the fly voice conversion
api = TTS("tts_models/deu/fairseq/vits")
api.tts_with_vc_to_file(
"Wie sage ich auf Italienisch, dass ich dich liebe?",
speaker_wav="target/speaker.wav",
file_path="output.wav"
)
Synthesize speech on command line.
You can either use your trained model or choose a model from the provided list.
If you don't specify any models, then it uses LJSpeech based English model.
If you have any questions or feedback, please contact the project maintainers:
- 0x_hayden
- Email: [email protected]
This project is maintained by:
If you find this project helpful, consider buying us a coffee:
SpeechStylis AI is licensed under the Mozilla License. See the LICENSE file for details.