From e38dcbea7ad7cac4ca3e4eaeaa6d254d90e6ff35 Mon Sep 17 00:00:00 2001 From: Enno Hermann Date: Thu, 12 Dec 2024 17:34:00 +0100 Subject: [PATCH] docs: streamline readme and reuse content in other docs pages [ci skip] --- README.md | 232 +++++++++++++++++++----------------- TTS/bin/synthesize.py | 129 ++++++++++---------- docs/source/index.md | 5 +- docs/source/inference.md | 194 ++---------------------------- docs/source/installation.md | 36 +----- docs/source/server.md | 30 +++++ scripts/sync_readme.py | 6 +- 7 files changed, 235 insertions(+), 397 deletions(-) create mode 100644 docs/source/server.md diff --git a/README.md b/README.md index c766b51415..9ccf8657ab 100644 --- a/README.md +++ b/README.md @@ -1,39 +1,34 @@ -# 🐸Coqui TTS -## News -- 📣 Fork of the [original, unmaintained repository](https://github.com/coqui-ai/TTS). New PyPI package: [coqui-tts](https://pypi.org/project/coqui-tts) -- 📣 [OpenVoice](https://github.com/myshell-ai/OpenVoice) models now available for voice conversion. -- 📣 Prebuilt wheels are now also published for Mac and Windows (in addition to Linux as before) for easier installation across platforms. -- 📣 XTTSv2 is here with 17 languages and better performance across the board. XTTS can stream with <200ms latency. -- 📣 XTTS fine-tuning code is out. Check the [example recipes](https://github.com/idiap/coqui-ai-TTS/tree/dev/recipes/ljspeech). -- 📣 You can use [Fairseq models in ~1100 languages](https://github.com/facebookresearch/fairseq/tree/main/examples/mms) with 🐸TTS. +# -## - -**🐸TTS is a library for advanced Text-to-Speech generation.** +**🐸 Coqui TTS is a library for advanced Text-to-Speech generation.** 🚀 Pretrained models in +1100 languages. 🛠️ Tools for training new models and fine-tuning existing models in any language. 📚 Utilities for dataset analysis and curation. -______________________________________________________________________ [![Discord](https://img.shields.io/discord/1037326658807533628?color=%239B59B6&label=chat%20on%20discord)](https://discord.gg/5eXr5seRrv) -![PyPI - Python Version](https://img.shields.io/pypi/pyversions/coqui-tts) +[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/coqui-tts)](https://pypi.org/project/coqui-tts/) [![License]()](https://opensource.org/licenses/MPL-2.0) -[![PyPI version](https://badge.fury.io/py/coqui-tts.svg)](https://badge.fury.io/py/coqui-tts) +[![PyPI version](https://badge.fury.io/py/coqui-tts.svg)](https://pypi.org/project/coqui-tts/) [![Downloads](https://pepy.tech/badge/coqui-tts)](https://pepy.tech/project/coqui-tts) [![DOI](https://zenodo.org/badge/265612440.svg)](https://zenodo.org/badge/latestdoi/265612440) - -![GithubActions](https://github.com/idiap/coqui-ai-TTS/actions/workflows/tests.yml/badge.svg) -![GithubActions](https://github.com/idiap/coqui-ai-TTS/actions/workflows/docker.yaml/badge.svg) -![GithubActions](https://github.com/idiap/coqui-ai-TTS/actions/workflows/style_check.yml/badge.svg) +[![GithubActions](https://github.com/idiap/coqui-ai-TTS/actions/workflows/tests.yml/badge.svg)](https://github.com/idiap/coqui-ai-TTS/actions/workflows/tests.yml) +[![GithubActions](https://github.com/idiap/coqui-ai-TTS/actions/workflows/docker.yaml/badge.svg)](https://github.com/idiap/coqui-ai-TTS/actions/workflows/docker.yaml) +[![GithubActions](https://github.com/idiap/coqui-ai-TTS/actions/workflows/style_check.yml/badge.svg)](https://github.com/idiap/coqui-ai-TTS/actions/workflows/style_check.yml) [![Docs]()](https://coqui-tts.readthedocs.io/en/latest/) -______________________________________________________________________ +## 📣 News +- **Fork of the [original, unmaintained repository](https://github.com/coqui-ai/TTS). New PyPI package: [coqui-tts](https://pypi.org/project/coqui-tts)** +- 0.25.0: [OpenVoice](https://github.com/myshell-ai/OpenVoice) models now available for voice conversion. +- 0.24.2: Prebuilt wheels are now also published for Mac and Windows (in addition to Linux as before) for easier installation across platforms. +- 0.20.0: XTTSv2 is here with 17 languages and better performance across the board. XTTS can stream with <200ms latency. +- 0.19.0: XTTS fine-tuning code is out. Check the [example recipes](https://github.com/idiap/coqui-ai-TTS/tree/dev/recipes/ljspeech). +- 0.14.1: You can use [Fairseq models in ~1100 languages](https://github.com/facebookresearch/fairseq/tree/main/examples/mms) with 🐸TTS. ## 💬 Where to ask questions Please use our dedicated channels for questions and discussion. Help is much more valuable if it's shared publicly so that more people can benefit from it. @@ -117,8 +112,10 @@ repository are also still a useful source of information. You can also help us implement more models. + ## Installation -🐸TTS is tested on Ubuntu 24.04 with **python >= 3.9, < 3.13.**, but should also + +🐸TTS is tested on Ubuntu 24.04 with **python >= 3.9, < 3.13**, but should also work on Mac and Windows. If you are only interested in [synthesizing speech](https://coqui-tts.readthedocs.io/en/latest/inference.html) with the pretrained 🐸TTS models, installing from PyPI is the easiest option. @@ -159,13 +156,15 @@ pip install -e .[server,ja] ### Platforms -If you are on Ubuntu (Debian), you can also run following commands for installation. +If you are on Ubuntu (Debian), you can also run the following commands for installation. ```bash -make system-deps # intended to be used on Ubuntu (Debian). Let us know if you have a different OS. +make system-deps make install ``` + + ## Docker Image You can also try out Coqui TTS without installation with the docker image. Simply run the following command and you will be able to run TTS: @@ -182,10 +181,10 @@ More details about the docker images (like GPU support) can be found ## Synthesizing speech by 🐸TTS - + ### 🐍 Python API -#### Running a multi-speaker and multi-lingual model +#### Multi-speaker and multi-lingual model ```python import torch @@ -197,47 +196,60 @@ device = "cuda" if torch.cuda.is_available() else "cpu" # List available 🐸TTS models print(TTS().list_models()) -# Init TTS +# Initialize TTS tts = TTS("tts_models/multilingual/multi-dataset/xtts_v2").to(device) +# List speakers +print(tts.speakers) + # Run TTS -# ❗ Since this model is multi-lingual voice cloning model, we must set the target speaker_wav and language -# Text to speech list of amplitude values as output -wav = tts.tts(text="Hello world!", speaker_wav="my/cloning/audio.wav", language="en") -# Text to speech to a file -tts.tts_to_file(text="Hello world!", speaker_wav="my/cloning/audio.wav", language="en", file_path="output.wav") +# ❗ XTTS supports both, but many models allow only one of the `speaker` and +# `speaker_wav` arguments + +# TTS with list of amplitude values as output, clone the voice from `speaker_wav` +wav = tts.tts( + text="Hello world!", + speaker_wav="my/cloning/audio.wav", + language="en" +) + +# TTS to a file, use a preset speaker +tts.tts_to_file( + text="Hello world!", + speaker="Craig Gutsy", + language="en", + file_path="output.wav" +) ``` -#### Running a single speaker model +#### Single speaker model ```python -# Init TTS with the target model name -tts = TTS(model_name="tts_models/de/thorsten/tacotron2-DDC", progress_bar=False).to(device) +# Initialize TTS with the target model name +tts = TTS("tts_models/de/thorsten/tacotron2-DDC").to(device) # Run TTS tts.tts_to_file(text="Ich bin eine Testnachricht.", file_path=OUTPUT_PATH) - -# Example voice cloning with YourTTS in English, French and Portuguese -tts = TTS(model_name="tts_models/multilingual/multi-dataset/your_tts", progress_bar=False).to(device) -tts.tts_to_file("This is voice cloning.", speaker_wav="my/cloning/audio.wav", language="en", file_path="output.wav") -tts.tts_to_file("C'est le clonage de la voix.", speaker_wav="my/cloning/audio.wav", language="fr-fr", file_path="output.wav") -tts.tts_to_file("Isso é clonagem de voz.", speaker_wav="my/cloning/audio.wav", language="pt-br", file_path="output.wav") ``` -#### Example voice conversion +#### Voice conversion (VC) Converting the voice in `source_wav` to the voice of `target_wav` ```python -tts = TTS(model_name="voice_conversion_models/multilingual/vctk/freevc24", progress_bar=False).to("cuda") -tts.voice_conversion_to_file(source_wav="my/source.wav", target_wav="my/target.wav", file_path="output.wav") +tts = TTS("voice_conversion_models/multilingual/vctk/freevc24").to("cuda") +tts.voice_conversion_to_file( + source_wav="my/source.wav", + target_wav="my/target.wav", + file_path="output.wav" +) ``` Other available voice conversion models: - `voice_conversion_models/multilingual/multi-dataset/openvoice_v1` - `voice_conversion_models/multilingual/multi-dataset/openvoice_v2` -#### Example voice cloning together with the default voice conversion model. +#### Voice cloning by combining single speaker TTS model with the default VC model This way, you can clone voices by using any model in 🐸TTS. The FreeVC model is used for voice conversion after synthesizing speech. @@ -252,7 +264,7 @@ tts.tts_with_vc_to_file( ) ``` -#### Example text to speech using **Fairseq models in ~1100 languages** 🤯. +#### TTS using Fairseq models in ~1100 languages 🤯 For Fairseq models, use the following name format: `tts_models//fairseq/vits`. You can find the language ISO codes [here](https://dl.fbaipublicfiles.com/mms/tts/all-tts-languages.html) and learn about the Fairseq models [here](https://github.com/facebookresearch/fairseq/tree/main/examples/mms). @@ -266,7 +278,7 @@ api.tts_to_file( ) ``` -### Command-line `tts` +### Command-line interface `tts` @@ -274,120 +286,118 @@ Synthesize speech on the command line. You can either use your trained model or choose a model from the provided list. -If you don't specify any models, then it uses a Tacotron2 English model trained -on LJSpeech. - -#### Single Speaker Models - - List provided models: + ```sh + tts --list_models ``` - $ tts --list_models - ``` - -- Get model info (for both tts_models and vocoder_models): - - - Query by type/name: - The model_info_by_name uses the name as it from the --list_models. - ``` - $ tts --model_info_by_name "///" - ``` - For example: - ``` - $ tts --model_info_by_name tts_models/tr/common-voice/glow-tts - $ tts --model_info_by_name vocoder_models/en/ljspeech/hifigan_v2 - ``` - - Query by type/idx: - The model_query_idx uses the corresponding idx from --list_models. - - ``` - $ tts --model_info_by_idx "/" - ``` - For example: - - ``` - $ tts --model_info_by_idx tts_models/3 - ``` +- Get model information. Use the names obtained from `--list_models`. + ```sh + tts --model_info_by_name "///" + ``` + For example: + ```sh + tts --model_info_by_name tts_models/tr/common-voice/glow-tts + tts --model_info_by_name vocoder_models/en/ljspeech/hifigan_v2 + ``` - - Query info for model info by full name: - ``` - $ tts --model_info_by_name "///" - ``` +#### Single speaker models -- Run TTS with default models: +- Run TTS with the default model (`tts_models/en/ljspeech/tacotron2-DDC`): - ``` - $ tts --text "Text for TTS" --out_path output/path/speech.wav + ```sh + tts --text "Text for TTS" --out_path output/path/speech.wav ``` - Run TTS and pipe out the generated TTS wav file data: - ``` - $ tts --text "Text for TTS" --pipe_out --out_path output/path/speech.wav | aplay + ```sh + tts --text "Text for TTS" --pipe_out --out_path output/path/speech.wav | aplay ``` - Run a TTS model with its default vocoder model: - ``` - $ tts --text "Text for TTS" --model_name "///" --out_path output/path/speech.wav + ```sh + tts --text "Text for TTS" \ + --model_name "///" \ + --out_path output/path/speech.wav ``` For example: - ``` - $ tts --text "Text for TTS" --model_name "tts_models/en/ljspeech/glow-tts" --out_path output/path/speech.wav + ```sh + tts --text "Text for TTS" \ + --model_name "tts_models/en/ljspeech/glow-tts" \ + --out_path output/path/speech.wav ``` -- Run with specific TTS and vocoder models from the list: +- Run with specific TTS and vocoder models from the list. Note that not every vocoder is compatible with every TTS model. - ``` - $ tts --text "Text for TTS" --model_name "///" --vocoder_name "///" --out_path output/path/speech.wav + ```sh + tts --text "Text for TTS" \ + --model_name "///" \ + --vocoder_name "///" \ + --out_path output/path/speech.wav ``` For example: - ``` - $ tts --text "Text for TTS" --model_name "tts_models/en/ljspeech/glow-tts" --vocoder_name "vocoder_models/en/ljspeech/univnet" --out_path output/path/speech.wav + ```sh + tts --text "Text for TTS" \ + --model_name "tts_models/en/ljspeech/glow-tts" \ + --vocoder_name "vocoder_models/en/ljspeech/univnet" \ + --out_path output/path/speech.wav ``` -- Run your own TTS model (Using Griffin-Lim Vocoder): +- Run your own TTS model (using Griffin-Lim Vocoder): - ``` - $ tts --text "Text for TTS" --model_path path/to/model.pth --config_path path/to/config.json --out_path output/path/speech.wav + ```sh + tts --text "Text for TTS" \ + --model_path path/to/model.pth \ + --config_path path/to/config.json \ + --out_path output/path/speech.wav ``` - Run your own TTS and Vocoder models: - ``` - $ tts --text "Text for TTS" --model_path path/to/model.pth --config_path path/to/config.json --out_path output/path/speech.wav - --vocoder_path path/to/vocoder.pth --vocoder_config_path path/to/vocoder_config.json + ```sh + tts --text "Text for TTS" \ + --model_path path/to/model.pth \ + --config_path path/to/config.json \ + --out_path output/path/speech.wav \ + --vocoder_path path/to/vocoder.pth \ + --vocoder_config_path path/to/vocoder_config.json ``` -#### Multi-speaker Models +#### Multi-speaker models -- List the available speakers and choose a among them: +- List the available speakers and choose a `` among them: - ``` - $ tts --model_name "//" --list_speaker_idxs + ```sh + tts --model_name "//" --list_speaker_idxs ``` - Run the multi-speaker TTS model with the target speaker ID: - ``` - $ tts --text "Text for TTS." --out_path output/path/speech.wav --model_name "//" --speaker_idx + ```sh + tts --text "Text for TTS." --out_path output/path/speech.wav \ + --model_name "//" --speaker_idx ``` - Run your own multi-speaker TTS model: - ``` - $ tts --text "Text for TTS" --out_path output/path/speech.wav --model_path path/to/model.pth --config_path path/to/config.json --speakers_file_path path/to/speaker.json --speaker_idx + ```sh + tts --text "Text for TTS" --out_path output/path/speech.wav \ + --model_path path/to/model.pth --config_path path/to/config.json \ + --speakers_file_path path/to/speaker.json --speaker_idx ``` -### Voice Conversion Models +#### Voice conversion models -``` -$ tts --out_path output/path/speech.wav --model_name "//" --source_wav --target_wav +```sh +tts --out_path output/path/speech.wav --model_name "//" \ + --source_wav --target_wav ``` diff --git a/TTS/bin/synthesize.py b/TTS/bin/synthesize.py index 885f6d6f0c..5fce93b7f4 100755 --- a/TTS/bin/synthesize.py +++ b/TTS/bin/synthesize.py @@ -14,123 +14,122 @@ logger = logging.getLogger(__name__) description = """ -Synthesize speech on command line. +Synthesize speech on the command line. You can either use your trained model or choose a model from the provided list. -If you don't specify any models, then it uses LJSpeech based English model. - -#### Single Speaker Models - - List provided models: + ```sh + tts --list_models ``` - $ tts --list_models - ``` - -- Get model info (for both tts_models and vocoder_models): - - - Query by type/name: - The model_info_by_name uses the name as it from the --list_models. - ``` - $ tts --model_info_by_name "///" - ``` - For example: - ``` - $ tts --model_info_by_name tts_models/tr/common-voice/glow-tts - $ tts --model_info_by_name vocoder_models/en/ljspeech/hifigan_v2 - ``` - - Query by type/idx: - The model_query_idx uses the corresponding idx from --list_models. - ``` - $ tts --model_info_by_idx "/" - ``` - - For example: - - ``` - $ tts --model_info_by_idx tts_models/3 - ``` +- Get model information. Use the names obtained from `--list_models`. + ```sh + tts --model_info_by_name "///" + ``` + For example: + ```sh + tts --model_info_by_name tts_models/tr/common-voice/glow-tts + tts --model_info_by_name vocoder_models/en/ljspeech/hifigan_v2 + ``` - - Query info for model info by full name: - ``` - $ tts --model_info_by_name "///" - ``` +#### Single Speaker Models -- Run TTS with default models: +- Run TTS with the default model (`tts_models/en/ljspeech/tacotron2-DDC`): - ``` - $ tts --text "Text for TTS" --out_path output/path/speech.wav + ```sh + tts --text "Text for TTS" --out_path output/path/speech.wav ``` - Run TTS and pipe out the generated TTS wav file data: - ``` - $ tts --text "Text for TTS" --pipe_out --out_path output/path/speech.wav | aplay + ```sh + tts --text "Text for TTS" --pipe_out --out_path output/path/speech.wav | aplay ``` - Run a TTS model with its default vocoder model: - ``` - $ tts --text "Text for TTS" --model_name "///" --out_path output/path/speech.wav + ```sh + tts --text "Text for TTS" \\ + --model_name "///" \\ + --out_path output/path/speech.wav ``` For example: - ``` - $ tts --text "Text for TTS" --model_name "tts_models/en/ljspeech/glow-tts" --out_path output/path/speech.wav + ```sh + tts --text "Text for TTS" \\ + --model_name "tts_models/en/ljspeech/glow-tts" \\ + --out_path output/path/speech.wav ``` -- Run with specific TTS and vocoder models from the list: +- Run with specific TTS and vocoder models from the list. Note that not every vocoder is compatible with every TTS model. - ``` - $ tts --text "Text for TTS" --model_name "///" --vocoder_name "///" --out_path output/path/speech.wav + ```sh + tts --text "Text for TTS" \\ + --model_name "///" \\ + --vocoder_name "///" \\ + --out_path output/path/speech.wav ``` For example: - ``` - $ tts --text "Text for TTS" --model_name "tts_models/en/ljspeech/glow-tts" --vocoder_name "vocoder_models/en/ljspeech/univnet" --out_path output/path/speech.wav + ```sh + tts --text "Text for TTS" \\ + --model_name "tts_models/en/ljspeech/glow-tts" \\ + --vocoder_name "vocoder_models/en/ljspeech/univnet" \\ + --out_path output/path/speech.wav ``` -- Run your own TTS model (Using Griffin-Lim Vocoder): +- Run your own TTS model (using Griffin-Lim Vocoder): - ``` - $ tts --text "Text for TTS" --model_path path/to/model.pth --config_path path/to/config.json --out_path output/path/speech.wav + ```sh + tts --text "Text for TTS" \\ + --model_path path/to/model.pth \\ + --config_path path/to/config.json \\ + --out_path output/path/speech.wav ``` - Run your own TTS and Vocoder models: - ``` - $ tts --text "Text for TTS" --model_path path/to/model.pth --config_path path/to/config.json --out_path output/path/speech.wav - --vocoder_path path/to/vocoder.pth --vocoder_config_path path/to/vocoder_config.json + ```sh + tts --text "Text for TTS" \\ + --model_path path/to/model.pth \\ + --config_path path/to/config.json \\ + --out_path output/path/speech.wav \\ + --vocoder_path path/to/vocoder.pth \\ + --vocoder_config_path path/to/vocoder_config.json ``` #### Multi-speaker Models -- List the available speakers and choose a among them: +- List the available speakers and choose a `` among them: - ``` - $ tts --model_name "//" --list_speaker_idxs + ```sh + tts --model_name "//" --list_speaker_idxs ``` - Run the multi-speaker TTS model with the target speaker ID: - ``` - $ tts --text "Text for TTS." --out_path output/path/speech.wav --model_name "//" --speaker_idx + ```sh + tts --text "Text for TTS." --out_path output/path/speech.wav \\ + --model_name "//" --speaker_idx ``` - Run your own multi-speaker TTS model: - ``` - $ tts --text "Text for TTS" --out_path output/path/speech.wav --model_path path/to/model.pth --config_path path/to/config.json --speakers_file_path path/to/speaker.json --speaker_idx + ```sh + tts --text "Text for TTS" --out_path output/path/speech.wav \\ + --model_path path/to/model.pth --config_path path/to/config.json \\ + --speakers_file_path path/to/speaker.json --speaker_idx ``` -### Voice Conversion Models +#### Voice Conversion Models -``` -$ tts --out_path output/path/speech.wav --model_name "//" --source_wav --target_wav +```sh +tts --out_path output/path/speech.wav --model_name "//" \\ + --source_wav --target_wav ``` """ diff --git a/docs/source/index.md b/docs/source/index.md index ae34771c68..3a030b4f81 100644 --- a/docs/source/index.md +++ b/docs/source/index.md @@ -1,8 +1,11 @@ +--- +hide-toc: true +--- ```{include} ../../README.md :relative-images: +:end-before: ``` ----- ```{toctree} :maxdepth: 1 diff --git a/docs/source/inference.md b/docs/source/inference.md index ccce84b08b..cb7d01fca3 100644 --- a/docs/source/inference.md +++ b/docs/source/inference.md @@ -1,199 +1,21 @@ (synthesizing_speech)= # Synthesizing speech -First, you need to install TTS. We recommend using PyPi. You need to call the command below: +## Overview -```bash -$ pip install coqui-tts -``` - -After the installation, 2 terminal commands are available. - -1. TTS Command Line Interface (CLI). - `tts` -2. Local Demo Server. - `tts-server` -3. In 🐍Python. - `from TTS.api import TTS` - -## On the Commandline - `tts` -![cli.gif](https://github.com/idiap/coqui-ai-TTS/raw/main/images/tts_cli.gif) - -After the installation, 🐸TTS provides a CLI interface for synthesizing speech using pre-trained models. You can either use your own model or the release models under 🐸TTS. - -Listing released 🐸TTS models. - -```bash -tts --list_models -``` +Coqui TTS provides three main methods for inference: -Run a TTS model, from the release models list, with its default vocoder. (Simply copy and paste the full model names from the list as arguments for the command below.) +1. 🐍Python API +2. TTS command line interface (CLI) +3. [Local demo server](server.md) -```bash -tts --text "Text for TTS" \ - --model_name "///" \ - --out_path folder/to/save/output.wav +```{include} ../../README.md +:start-after: ``` -Run a tts and a vocoder model from the released model list. Note that not every vocoder is compatible with every TTS model. - -```bash -tts --text "Text for TTS" \ - --model_name "tts_models///" \ - --vocoder_name "vocoder_models///" \ - --out_path folder/to/save/output.wav -``` - -Run your own TTS model (Using Griffin-Lim Vocoder) - -```bash -tts --text "Text for TTS" \ - --model_path path/to/model.pth \ - --config_path path/to/config.json \ - --out_path folder/to/save/output.wav -``` - -Run your own TTS and Vocoder models - -```bash -tts --text "Text for TTS" \ - --config_path path/to/config.json \ - --model_path path/to/model.pth \ - --out_path folder/to/save/output.wav \ - --vocoder_path path/to/vocoder.pth \ - --vocoder_config_path path/to/vocoder_config.json -``` - -Run a multi-speaker TTS model from the released models list. - -```bash -tts --model_name "tts_models///" --list_speaker_idxs # list the possible speaker IDs. -tts --text "Text for TTS." --out_path output/path/speech.wav --model_name "tts_models///" --speaker_idx "" -``` - -Run a released voice conversion model - -```bash -tts --model_name "voice_conversion///" - --source_wav "my/source/speaker/audio.wav" - --target_wav "my/target/speaker/audio.wav" - --out_path folder/to/save/output.wav -``` - -**Note:** You can use ```./TTS/bin/synthesize.py``` if you prefer running ```tts``` from the TTS project folder. - -## On the Demo Server - `tts-server` - - -![server.gif](https://github.com/idiap/coqui-ai-TTS/raw/main/images/demo_server.gif) - -You can boot up a demo 🐸TTS server to run an inference with your models (make -sure to install the additional dependencies with `pip install coqui-tts[server]`). -Note that the server is not optimized for performance and does not support all -Coqui models yet. - -The demo server provides pretty much the same interface as the CLI command. - -```bash -tts-server -h # see the help -tts-server --list_models # list the available models. -``` - -Run a TTS model, from the release models list, with its default vocoder. -If the model you choose is a multi-speaker TTS model, you can select different speakers on the Web interface and synthesize -speech. - -```bash -tts-server --model_name "///" -``` - -Run a TTS and a vocoder model from the released model list. Note that not every vocoder is compatible with every TTS model. - -```bash -tts-server --model_name "///" \ - --vocoder_name "///" -``` - -## Python 🐸TTS API - -You can run a multi-speaker and multi-lingual model in Python as - -```python -import torch -from TTS.api import TTS - -# Get device -device = "cuda" if torch.cuda.is_available() else "cpu" - -# List available 🐸TTS models -print(TTS().list_models()) - -# Init TTS -tts = TTS("tts_models/multilingual/multi-dataset/xtts_v2").to(device) - -# Run TTS -# ❗ Since this model is multi-lingual voice cloning model, we must set the target speaker_wav and language -# Text to speech list of amplitude values as output -wav = tts.tts(text="Hello world!", speaker_wav="my/cloning/audio.wav", language="en") -# Text to speech to a file -tts.tts_to_file(text="Hello world!", speaker_wav="my/cloning/audio.wav", language="en", file_path="output.wav") -``` - -### Single speaker model. - -```python -# Init TTS with the target model name -tts = TTS(model_name="tts_models/de/thorsten/tacotron2-DDC", progress_bar=False) -# Run TTS -tts.tts_to_file(text="Ich bin eine Testnachricht.", file_path=OUTPUT_PATH) -``` - -### Voice cloning with YourTTS in English, French and Portuguese: - -```python -tts = TTS(model_name="tts_models/multilingual/multi-dataset/your_tts", progress_bar=False).to("cuda") -tts.tts_to_file("This is voice cloning.", speaker_wav="my/cloning/audio.wav", language="en", file_path="output.wav") -tts.tts_to_file("C'est le clonage de la voix.", speaker_wav="my/cloning/audio.wav", language="fr", file_path="output.wav") -tts.tts_to_file("Isso é clonagem de voz.", speaker_wav="my/cloning/audio.wav", language="pt", file_path="output.wav") -``` - -### Voice conversion from the speaker of `source_wav` to the speaker of `target_wav` - -```python -tts = TTS(model_name="voice_conversion_models/multilingual/vctk/freevc24", progress_bar=False).to("cuda") -tts.voice_conversion_to_file(source_wav="my/source.wav", target_wav="my/target.wav", file_path="output.wav") -``` - -### Voice cloning by combining single speaker TTS model with the voice conversion model. - -This way, you can clone voices by using any model in 🐸TTS. - -```python -tts = TTS("tts_models/de/thorsten/tacotron2-DDC") -tts.tts_with_vc_to_file( - "Wie sage ich auf Italienisch, dass ich dich liebe?", - speaker_wav="target/speaker.wav", - file_path="ouptut.wav" -) -``` - -### Text to speech using **Fairseq models in ~1100 languages** 🤯. -For these models use the following name format: `tts_models//fairseq/vits`. - -You can find the list of language ISO codes [here](https://dl.fbaipublicfiles.com/mms/tts/all-tts-languages.html) and learn about the Fairseq models [here](https://github.com/facebookresearch/fairseq/tree/main/examples/mms). - -```python -from TTS.api import TTS -api = TTS(model_name="tts_models/eng/fairseq/vits").to("cuda") -api.tts_to_file("This is a test.", file_path="output.wav") - -# TTS with on the fly voice conversion -api = TTS("tts_models/deu/fairseq/vits") -api.tts_with_vc_to_file( - "Wie sage ich auf Italienisch, dass ich dich liebe?", - speaker_wav="target/speaker.wav", - file_path="ouptut.wav" -) -``` ```{toctree} :hidden: +server marytts ``` diff --git a/docs/source/installation.md b/docs/source/installation.md index 5becc28b70..1315395a59 100644 --- a/docs/source/installation.md +++ b/docs/source/installation.md @@ -1,36 +1,6 @@ # Installation -🐸TTS supports python >=3.9 <3.13.0 and was tested on Ubuntu 24.04, but should -also run on Mac and Windows. - -## Using `pip` - -`pip` is recommended if you want to use 🐸TTS only for inference. - -You can install from PyPI as follows: - -```bash -pip install coqui-tts # from PyPI -``` - -Or install from Github: - -```bash -pip install git+https://github.com/idiap/coqui-ai-TTS # from Github -``` - -## Installing From Source - -This is recommended for development and more control over 🐸TTS. - -```bash -git clone https://github.com/idiap/coqui-ai-TTS -cd coqui-ai-TTS -make system-deps # only on Linux systems. - -# Install package and optional extras -make install - -# Same as above + dev dependencies and pre-commit -make install_dev +```{include} ../../README.md +:start-after: +:end-before: ``` diff --git a/docs/source/server.md b/docs/source/server.md new file mode 100644 index 0000000000..3fa211d0d7 --- /dev/null +++ b/docs/source/server.md @@ -0,0 +1,30 @@ +# Demo server + +![server.gif](https://github.com/idiap/coqui-ai-TTS/raw/main/images/demo_server.gif) + +You can boot up a demo 🐸TTS server to run an inference with your models (make +sure to install the additional dependencies with `pip install coqui-tts[server]`). +Note that the server is not optimized for performance and does not support all +Coqui models yet. + +The demo server provides pretty much the same interface as the CLI command. + +```bash +tts-server -h # see the help +tts-server --list_models # list the available models. +``` + +Run a TTS model, from the release models list, with its default vocoder. +If the model you choose is a multi-speaker TTS model, you can select different speakers on the Web interface and synthesize +speech. + +```bash +tts-server --model_name "///" +``` + +Run a TTS and a vocoder model from the released model list. Note that not every vocoder is compatible with every TTS model. + +```bash +tts-server --model_name "///" \ + --vocoder_name "///" +``` diff --git a/scripts/sync_readme.py b/scripts/sync_readme.py index 584286814b..97256bca6d 100644 --- a/scripts/sync_readme.py +++ b/scripts/sync_readme.py @@ -22,8 +22,12 @@ def sync_readme(): new_content = replace_between_markers(orig_content, "tts-readme", description.strip()) if args.check: if orig_content != new_content: - print("README.md is out of sync; please edit TTS/bin/TTS_README.md and run scripts/sync_readme.py") + print( + "README.md is out of sync; please reconcile README.md and TTS/bin/synthesize.py and run scripts/sync_readme.py" + ) exit(42) + print("All good, files in sync") + exit(0) readme_path.write_text(new_content) print("Updated README.md")