Intelligibility and naturalness of articulatory text-to-speech synthesis compared to established speech synthesis technologies

This repository contains the supplementary materials to the paper:

P. K. Krug et al., "Intelligibility and naturalness of articulatory synthesis with VocalTractLab compared to established speech synthesis technologies" in SSW11, 2021 (Submitted).

What is this repository for?

This repository contains the audio files used in the study described in the paper above. It also provides the data and information needed to reproduce the audio samples.

Requirements

If you would like to reproduce the audio samples presented in this repo, you will have to access the respective TTS and re-synthesis systems. In case of Azure¹ (neural and standard), Google² (neural and standard) and MaryTTS³ you can use the respetive online demo services. In case of MBROLA-Res you will have to set up the open source software MBROLA⁴ and synthesize the utterances using the provided ".pho" files from this repository. In case of VTL-Res and VTL-TTS, the utterances can be reproduced using the open source software VocalTractLab⁵ and the provided ".seg" and ".ges" files. In case of DRESS you will have to set up the DRESS TTS system.

For more information regarding each type of syntheses, see the "Info.txt" files provided in each folder.

File description

This repository is organized the following way:

supplementary materials Contains all relevant audio files used in the experiments as well as data and information needed to reproduce the samples.

└─── Name of the synthesis system Contains all audio samples of a specific type.
- └─── "Name of the utterance".wav The audio file of the respective utterance.
- └─── Info.txt Information regarding the synthesis, settings of the TTS system etc.
- └─── (If necessary) data Contains the data necessary to reproduce the syntheses.
  - └─── "Name of the utterance".datafile Datafiles are either ".pho", ".ges", or ".seg".
└─── Example sentence Used at the beginning of the listening experiment, in order to set the volume level.

How to cite this work

This work is distributed under the GNU GPL 3.0 License. If you use parts of this work in your own work, please cite the following reference:

P. K. Krug et al., "Intelligibility and naturalness of articulatory synthesis with VocalTractLab compared to established speech synthesis technologies" in SSW11, 2021 (Submitted)

References

1 https://azure.microsoft.com/en-us/services/cognitive-services/text-to-speech/ ↩

2 https://cloud.google.com/text-to-speech ↩

3 http://mary.dfki.de:59125/ ↩

4 https://github.com/numediart/MBROLA ↩

5 https://vocaltractlab.de/ ↩

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
supplementary materials		supplementary materials
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Intelligibility and naturalness of articulatory text-to-speech synthesis compared to established speech synthesis technologies

What is this repository for?

Requirements

File description

How to cite this work

References

About

Releases

Packages

TUD-STKS/TTS_Comparison_SSW21

Folders and files

Latest commit

History

Repository files navigation

Intelligibility and naturalness of articulatory text-to-speech synthesis compared to established speech synthesis technologies

What is this repository for?

Requirements

File description

How to cite this work

References

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages