Skip to content

Systematic comparison between articulatory TTS synthesis with VocalTractLab and DNN, HMM, unit-selection and diphon TTS syntheses from various different systems.

Notifications You must be signed in to change notification settings

TUD-STKS/TTS_Comparison_SSW21

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 

Repository files navigation

Intelligibility and naturalness of articulatory text-to-speech synthesis compared to established speech synthesis technologies

This repository contains the supplementary materials to the paper:

P. K. Krug et al., "Intelligibility and naturalness of articulatory synthesis with VocalTractLab compared to established speech synthesis technologies" in SSW11, 2021 (Submitted).

What is this repository for?

This repository contains the audio files used in the study described in the paper above. It also provides the data and information needed to reproduce the audio samples.

Requirements

If you would like to reproduce the audio samples presented in this repo, you will have to access the respective TTS and re-synthesis systems. In case of Azure1 (neural and standard), Google2 (neural and standard) and MaryTTS3 you can use the respetive online demo services. In case of MBROLA-Res you will have to set up the open source software MBROLA4 and synthesize the utterances using the provided ".pho" files from this repository. In case of VTL-Res and VTL-TTS, the utterances can be reproduced using the open source software VocalTractLab5 and the provided ".seg" and ".ges" files. In case of DRESS you will have to set up the DRESS TTS system.

For more information regarding each type of syntheses, see the "Info.txt" files provided in each folder.

File description

This repository is organized the following way:

  • supplementary materials Contains all relevant audio files used in the experiments as well as data and information needed to reproduce the samples.

    └─── Name of the synthesis system Contains all audio samples of a specific type.

    • └─── "Name of the utterance".wav The audio file of the respective utterance.

    • └─── Info.txt Information regarding the synthesis, settings of the TTS system etc.

    • └─── (If necessary) data Contains the data necessary to reproduce the syntheses.

      • └─── "Name of the utterance".datafile Datafiles are either ".pho", ".ges", or ".seg".

    └─── Example sentence Used at the beginning of the listening experiment, in order to set the volume level.

How to cite this work

This work is distributed under the GNU GPL 3.0 License. If you use parts of this work in your own work, please cite the following reference:

  • P. K. Krug et al., "Intelligibility and naturalness of articulatory synthesis with VocalTractLab compared to established speech synthesis technologies" in SSW11, 2021 (Submitted)

References

1 https://azure.microsoft.com/en-us/services/cognitive-services/text-to-speech/

2 https://cloud.google.com/text-to-speech

3 http://mary.dfki.de:59125/

4 https://github.com/numediart/MBROLA

5 https://vocaltractlab.de/

About

Systematic comparison between articulatory TTS synthesis with VocalTractLab and DNN, HMM, unit-selection and diphon TTS syntheses from various different systems.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published