Training TTS For a New Language #1198

skumarlabs · 2022-02-04T15:39:12Z

skumarlabs
Feb 4, 2022

So, I want to train the TTS for an Indian language Hindi. After going through the docs, I would say I have figured out few things which are needed to achieve the above. But now I think, the more I am reading Discussions, the more I am getting confused. Can someone please help me understand what exact steps are needed to train on a new model using my dataset.

Below is what I have done based on my understanding so far:

I have prepared my dataset in ljspeech format so that my dataset have a metadata.csv and actual recording under wavs folder.
I have analyzed and cleaned my dataset as per guidlines at formatting_your_dataset
I have created a config.json file taking tutorial_for_nervous_beginners & faqs as reference. which is below

{
    "model": "vits",
    "batch_size": 48,
    "eval_batch_size": 16,
    "num_loader_workers": 4,
    "num_eval_loader_workers": 4,
    "run_eval": true,
    "test_delay_epochs": -1,
    "epochs": 1000,
    "text_cleaner": "basic_cleaners",
    "use_phonemes": false,
    "phoneme_language": "hi",
    "phoneme_cache_path": "phoneme_cache",
    "print_step": 25,
    "print_eval": true,
    "mixed_precision": true,
    "max_seq_len":500000,
    "output_path": "TTS/recipes/ljspeech/vits_tts",
    "test_sentences": ["मेरे प्यारे देशवासियो", "आपके परिवार के सदस्य के रूप में उपस्थित हूँ।"],
    "datasets":[{"name": "ljspeech", "meta_file_train":"metadata.csv", "path": "/path/to/dataset/root"}],
    "characters":{
        "characters": "अआइईउऊऍएऐऑओऔकखगघचछजझटठडढणतथदधनपफबभमयरलवशषसह़ािीुूृेैॉोौ्ज़ड़ढ़फ़।–‘’“”!,-.?ँंः",
        "punctuations": "-–,.:;'‘’“”()!?ँंः।",
        "phonemes": null,
        "unique": true
    },
    "audio":{
        "sample_rate": 32000
    }
 }

Now, I tried following YourTTS notebooks as well but got stuck at restoring model state dict step because of below error:
RuntimeError: Error(s) in loading state_dict for Vits: size mismatch for text_encoder.emb.weight: copying a param with shape torch.Size([165, 192]) from checkpoint, the shape in current model is torch.Size([103, 192]).

Also,

I am not sure if I should retrain model from scratch or restore a pretrained model and finetune that since my language and it's character set is entirely different from that on which the pretrained models are available.
Where does Speaker encoder model fits here in the process? Is it calculated automatically if I run the training with above config.json?

Can someone guide me here or point to a resource on what I need to do train a TTS on my Hindi dataset? Thanks for reading so far!

jaggukaka · 2022-02-04T19:02:32Z

jaggukaka
Feb 4, 2022

I guess to get past the error may be you need to keep the original trained model's character set too. After that see how it goes.
Speaker encoder is a separate model.
There are two modes which you can use to have speaker specific embeddings,

Use a pre-trained speaker encoder and generate d_vectors for all the samples you are including in your dataset
Use speaker_embedding, which is another layer in the network and it learns the speaker specific embedding as you train the model.

But, probably the model which you are fine tuning is mostly trained with a speaker encoder, I guess it is better to generate d_vectors for every utterance in your data set and pass it to the model and not use speaker_embedding layer. (There are properties in config.json to set this appropriately)
You can generate d_vectors for your dataset by using this script.

Hope this information helps.

12 replies

dsunjka Nov 1, 2022

@jaggukaka I would also be very interested in your lessons learned and any hints you might have for training for a new language.

swapnil3597 Jan 30, 2023

Hi @jaggukaka, can you please share how were you able to train Telugu TTS.

Rajashekhar-Reddy May 15, 2023

@swapnil3597 Have you trained Telugu TTS successfully?

Abhi84sia Sep 12, 2023

how to apply on hindi

nagababumo Jun 29, 2024

I guess to get past the error may be you need to keep the original trained model's character set too. After that see how it goes. Speaker encoder is a separate model. There are two modes which you can use to have speaker specific embeddings,

Use a pre-trained speaker encoder and generate d_vectors for all the samples you are including in your dataset

Use speaker_embedding, which is another layer in the network and it learns the speaker specific embedding as you train the model.

But, probably the model which you are fine tuning is mostly trained with a speaker encoder, I guess it is better to generate d_vectors for every utterance in your data set and pass it to the model and not use speaker_embedding layer. (There are properties in config.json to set this appropriately) You can generate d_vectors for your dataset by using this script.

Hope this information helps.

hii can you help me in making on hindi please, it alot to me

erogol · 2022-07-26T11:45:07Z

erogol
Jul 26, 2022
Maintainer

Since your character set is totally different, you might need to train from scratch but restoring from the pre-trained model would at least help the vocoder to learn faster.

0 replies

Rajashekhar-Reddy · 2023-05-15T16:56:28Z

Rajashekhar-Reddy
May 15, 2023

@jaggukaka Have you tried to train Telugu?

1 reply

RAVINDRA8008 Dec 24, 2024

Got any solution mate?

souvikg544 · 2023-05-30T07:09:12Z

varshkolla · 2024-11-05T17:26:42Z

varshkolla
Nov 5, 2024

could you please let me know how to train the tts model for telugu. I have been trying to do using the AI4basic datset. @jaggukaka

1 reply

RAVINDRA8008 Dec 24, 2024

Got any update? On telugu? Language?

varshkolla · 2024-12-24T07:44:21Z

varshkolla
Dec 24, 2024

Nope not yet

…

On Tue, 24 Dec 2024, 12:45 RAVINDRA CHOWDARY JONNAGADLA, < ***@***.***> wrote: Got any update? On telugu? Language? — Reply to this email directly, view it on GitHub <#1198 (reply in thread)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/BCE3I4CE667VY63VOE6WY7L2HECYDAVCNFSM5NSDPV22U5DIOJSWCZC7NNSXTOSENFZWG5LTONUW63SDN5WW2ZLOOQ5TCMJWGU3DCMRZ> . You are receiving this because you commented.Message ID: ***@***.***>

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training TTS For a New Language #1198

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 6 comments 23 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Training TTS For a New Language #1198

Replies: 6 comments · 23 replies

erogol Jul 26, 2022 Maintainer

Replies: 6 comments 23 replies

erogol
Jul 26, 2022
Maintainer