the performance of new voice(fintune) is bad #11

linlinsongyun · 2023-03-20T08:34:16Z

Thanks for your nice work.
The code works well with the pretrain stage. However, when i finetune towards an unseen voice with 10 sentences, the results is bad. The speech quality is bad, and the voice is significantly different. what went wrong?

tuanh123789 · 2023-03-20T09:18:36Z

What dataset do you use in pretrain stage ?

tuanh123789 · 2023-03-20T09:19:59Z

And is the language in pretrain and finetune the same ?

linlinsongyun · 2023-03-20T09:23:02Z

A mandarin multi-speaker dataset was used for pretraining. Another Chinese speaker was used for finetuning.

linlinsongyun · 2023-03-20T09:24:35Z

I mentioned that only the decoder and speaker embeddings have gradients during finetune. If the decoder weights should have no grad except the condition layer norm?

tuanh123789 · 2023-03-20T09:26:10Z

Do you set num_speaker in model config equal to number of speakers in mandarin dataset in pretrain stage?

tuanh123789 · 2023-03-20T09:29:36Z

I mentioned that only the decoder and speaker embeddings have gradients during finetune. If the decoder weights should have no grad except the condition layer norm?

Only speaker embedding and condition layernorm. I follow the paper

linlinsongyun · 2023-03-20T09:30:21Z

Do you set num_speaker in model config equal to number of speakers in mandarin dataset in pretrain stage?

yes. i use the default config "num_speaker: 955". There are 30 speakers in the pretrain stage, whose speaker id are ranging from 1 to 31. And i use speaker_id=50 in the finetune stage.

tuanh123789 · 2023-03-20T09:49:48Z

You have to change default config "num_speaker" equal to 30 (in your case) in pretrain stage. When finetune, just set your speaker_id = 0.

linlinsongyun · 2023-03-20T10:59:00Z

You have to change default config "num_speaker" equal to 30 (in your case) in pretrain stage. When finetune, just set your speaker_id = 0.

ok, i will have a try. Thanks a lot.

vedantk-b · 2023-06-23T16:15:25Z

@linlinsongyun did the finetuning improve after you changed the number of speakers?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

the performance of new voice(fintune) is bad #11

the performance of new voice(fintune) is bad #11

linlinsongyun commented Mar 20, 2023

tuanh123789 commented Mar 20, 2023

tuanh123789 commented Mar 20, 2023 •

edited

Loading

linlinsongyun commented Mar 20, 2023

linlinsongyun commented Mar 20, 2023

tuanh123789 commented Mar 20, 2023 •

edited

Loading

tuanh123789 commented Mar 20, 2023

linlinsongyun commented Mar 20, 2023

tuanh123789 commented Mar 20, 2023 •

edited

Loading

linlinsongyun commented Mar 20, 2023

vedantk-b commented Jun 23, 2023

the performance of new voice(fintune) is bad #11

the performance of new voice(fintune) is bad #11

Comments

linlinsongyun commented Mar 20, 2023

tuanh123789 commented Mar 20, 2023

tuanh123789 commented Mar 20, 2023 • edited Loading

linlinsongyun commented Mar 20, 2023

linlinsongyun commented Mar 20, 2023

tuanh123789 commented Mar 20, 2023 • edited Loading

tuanh123789 commented Mar 20, 2023

linlinsongyun commented Mar 20, 2023

tuanh123789 commented Mar 20, 2023 • edited Loading

linlinsongyun commented Mar 20, 2023

vedantk-b commented Jun 23, 2023

tuanh123789 commented Mar 20, 2023 •

edited

Loading

tuanh123789 commented Mar 20, 2023 •

edited

Loading

tuanh123789 commented Mar 20, 2023 •

edited

Loading