Training TTS For a New Language #1198
Replies: 5 comments 21 replies
-
I guess to get past the error may be you need to keep the original trained model's character set too. After that see how it goes.
But, probably the model which you are fine tuning is mostly trained with a speaker encoder, I guess it is better to generate d_vectors for every utterance in your data set and pass it to the model and not use speaker_embedding layer. (There are properties in config.json to set this appropriately) Hope this information helps. |
Beta Was this translation helpful? Give feedback.
-
Since your character set is totally different, you might need to train from scratch but restoring from the pre-trained model would at least help the vocoder to learn faster. |
Beta Was this translation helpful? Give feedback.
-
@jaggukaka Have you tried to train Telugu? |
Beta Was this translation helpful? Give feedback.
-
hello has anyone done it? |
Beta Was this translation helpful? Give feedback.
-
could you please let me know how to train the tts model for telugu. I have been trying to do using the AI4basic datset. @jaggukaka |
Beta Was this translation helpful? Give feedback.
-
So, I want to train the TTS for an Indian language Hindi. After going through the docs, I would say I have figured out few things which are needed to achieve the above. But now I think, the more I am reading
Discussions
, the more I am getting confused. Can someone please help me understand what exact steps are needed to train on a new model using my dataset.Below is what I have done based on my understanding so far:
ljspeech
format so that my dataset have ametadata.csv
and actual recording underwavs
folder.config.json
file taking tutorial_for_nervous_beginners & faqs as reference. which is belowNow, I tried following YourTTS notebooks as well but got stuck at
restoring model state dict
step because of below error:RuntimeError: Error(s) in loading state_dict for Vits: size mismatch for text_encoder.emb.weight: copying a param with shape torch.Size([165, 192]) from checkpoint, the shape in current model is torch.Size([103, 192]).
Also,
I am not sure if I should retrain model from scratch or restore a pretrained model and finetune that since my language and it's character set is entirely different from that on which the pretrained models are available.
Where does Speaker encoder model fits here in the process? Is it calculated automatically if I run the training with above config.json?
Can someone guide me here or point to a resource on what I need to do train a TTS on my Hindi dataset? Thanks for reading so far!
Beta Was this translation helpful? Give feedback.
All reactions