(简体中文|English)
PP-TTS is a streaming speech synthesis system developed by PaddleSpeech. Based on the implementation of SOTA Algorithms, a faster inference engine is used to realize streaming speech synthesis technology to meet the needs of commercial speech interaction scenarios.
Pipline of TTS:
PP-TTS provides a Chinese streaming speech synthesis system based on FastSpeech2 and HiFiGAN by default:
- Text Frontend: The rule-based Chinese text frontend system is adopted to optimize Chinese text such as text normalization, polyphony, and tone sandhi.
- Acoustic Model: The decoder of FastSpeech2 is improved so that it can be stream synthesized
- Vocoder: Streaming synthesis of GAN vocoder is supported
- Inference Engine: Using ONNXRuntime to optimize the inference of TTS models, so that the TTS system can also achieve RTF < 1 on low-voltage, meeting the requirements of streaming synthesis
- Open source leading Chinese TTS system
- Using ONNXRuntime to optimize the inference of TTS models
- The only open-source streaming TTS system
- Easy disassembly: Developers can easily replace different acoustic models and vocoders in different languages, use different inference engines (Paddle dynamic graph, PaddleInference, ONNXRuntime, etc.), and use different network services (HTTP, WebSocket)
PaddleSpeech TTS models' benchmark: TTS-Benchmark。
Default FastSpeech2: tts3/run.sh
Streaming FastSpeech2: tts3/run_cnndecoder.sh
HiFiGAN:voc5/run.sh
text_to_speech - convert text into speech: text_to_speech
style_fs2 - multi style control for FastSpeech2 model: style_fs2
story talker - book reader based on OCR and TTS: story_talker
metaverse - 2D AR with TTS: metaverse
Non-streaming TTS Server: speech_server
Streaming TTS Server: streaming_tts_server
For more tutorials please see: PP-TTS:流式语音合成原理及服务部署