Can you give samples for speaker embedding and inferenced samples? #6

ljh0412 · 2022-08-24T02:09:07Z

Firstly, I really appreciate for this repo. It helped me a lot for learning about TTS.

But I think I met some problems on inference stage.

I trained the model with LibriTTS with adjusted configs from FastSpeech2 repo, just removing language options.
(If you wish, I will make a pull request about it. It would be helpful for others to train model.)

While the training loss was as you shown, I cannot get proper duration prediction while I'm doing inference.

I checked the training stage where synth_one_sample function operates by saving wavs, and I saw that predicted speech and reconstructed speech was fairly good quality (a bit error for mel prediction though).

So I guess there could be some issues on mel embedding for conditional normalization layer and speaker embedding.

Maybe there could be some conflicts on them?

In this sense, it will be helpful for me and other people to get some inference examples such as speaker embedding samples and inferenced samples.

I attach some samples, configs, commands here.
tested_data.zip

The text was updated successfully, but these errors were encountered:

cantabile-kwok · 2022-11-15T13:02:31Z

I'm getting a very similar problem, troubling me for days. Have you solved it?

Here are my tensorboard logs. They seem pretty strange as the losses stop decreasing after a very short period of time (several thousand steps) and start to blow up. This phenomenon even happens before phone-level-embedding-prediction ( which could also be a trouble!)

cantabile-kwok · 2022-11-17T06:52:36Z

Luckily I found my problem originated not from the model or code itself. It was from the value of x-vector I was using. I used the x-vectors extracted from SpeechBrain library instead of speaker embedding table, and the values in these x-vectors can range from -100 to +100. This caused numerical instability in conditional layer norms, so the loss cannot be decreased. After normalizing these embeddings, my training process went correct.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can you give samples for speaker embedding and inferenced samples? #6

Can you give samples for speaker embedding and inferenced samples? #6

ljh0412 commented Aug 24, 2022

cantabile-kwok commented Nov 15, 2022 •

edited

Loading

cantabile-kwok commented Nov 17, 2022

Can you give samples for speaker embedding and inferenced samples? #6

Can you give samples for speaker embedding and inferenced samples? #6

Comments

ljh0412 commented Aug 24, 2022

cantabile-kwok commented Nov 15, 2022 • edited Loading

cantabile-kwok commented Nov 17, 2022

cantabile-kwok commented Nov 15, 2022 •

edited

Loading