-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
the performance of new voice(fintune) is bad #11
Comments
What dataset do you use in pretrain stage ? |
And is the language in pretrain and finetune the same ? |
A mandarin multi-speaker dataset was used for pretraining. Another Chinese speaker was used for finetuning. |
I mentioned that only the decoder and speaker embeddings have gradients during finetune. If the decoder weights should have no grad except the condition layer norm? |
Do you set num_speaker in model config equal to number of speakers in mandarin dataset in pretrain stage? |
Only speaker embedding and condition layernorm. I follow the paper |
yes. i use the default config "num_speaker: 955". There are 30 speakers in the pretrain stage, whose speaker id are ranging from 1 to 31. And i use speaker_id=50 in the finetune stage. |
You have to change default config "num_speaker" equal to 30 (in your case) in pretrain stage. When finetune, just set your speaker_id = 0. |
ok, i will have a try. Thanks a lot. |
@linlinsongyun did the finetuning improve after you changed the number of speakers? |
Thanks for your nice work.
The code works well with the pretrain stage. However, when i finetune towards an unseen voice with 10 sentences, the results is bad. The speech quality is bad, and the voice is significantly different. what went wrong?
The text was updated successfully, but these errors were encountered: