How Neural TTS works

Production

Neural TTS is the latest breakthrough of text to speech technology. The Azure neural TTS team first released Neural TTS based production service in 2018.9.

Our text-to-speech capability uses deep neural networks to overcome the limits of traditional text-to-speech systems in matching the patterns of stress and intonation in spoken language, called prosody, and in synthesizing the units of speech into a computer voice.

By using the computational power of Azure, we can deliver real-time streaming, which is useful for situations such as interacting with a chatbot or virtual assistant. The capability is served in the Azure Kubernetes Service. This ensures high scalability and availability and gives customers the ability to use neural text-to-speech and traditional text-to-speech from a single endpoint.

Research

Neural TTS research is active in last 3 years. The team keeps pushing state of art on neural TTS research front.
A few selected papers:

Neural Speech Synthesis with Transformer Network, AAAI 2019
FastSpeech: Fast, Robust and Controllable Text to Speech, NeurIPS 2019
RobuTrans: a Robust Transformer based Text-to-Speech Model. To appear in AAAI 2020

Azure TTS: Empower every person and every organization on the planet to have a delightful digital voice!
Azure Custom Voice: Build your one-of-a-kind Custom Voice and close to human Neural TTS in cloud and edge!

Azure Speech Document

Create Custom Neural Voice

Speech SDK

Azure Speech Containers

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How Neural TTS works

Production

Research

Clone this wiki locally