Training issues #106

ZJ-CAI · 2022-06-17T08:19:57Z

ZJ-CAI
Jun 17, 2022

Hello Mr.Betker. I`d like to train this model but meet with a problem.

I input the 3 mel-spectrograms of the target voice and the text, and then i get the output which is used to calculate loss with target. For instance, the input is the mel_spectrograms of speaker Angies speechs(speechA, speechB & speechC) and a text of Angies speechD. The target would be the mel_spectrogram of speaker Angie`s speechD.

Unfortunately, after the whole processes(autoregressive, diffusion, clvp, vocoder), the shape of the output is "torch.Size([1, 557056])". Whilst the shape of target mel_spectrogram is "torch.Size([1, 95520])". Thus, i am confused how to calculate the loss between the output and target?

Besides, is there any solution to make their shapes be the same?

neonbjb · 2022-06-17T15:32:54Z

neonbjb
Jun 17, 2022
Maintainer

Hey,
You can't train Tortoise end to end. Each of the component models must be trained piecewise since they all have very different objective functions. It's a fairly involved, complicated process. I recommend you start by trying to train a small DALLE model and a diffusion model. These will get you the skills necessary to train Tortoise.

0 replies

ZJ-CAI · 2022-06-21T01:46:23Z

ZJ-CAI
Jun 21, 2022
Author

Thank you for your reply.
Does DALLE and diffusion model respectively refer to https://github.com/lucidrains/DALLE-pytorch and https://github.com/openai/guided-diffusion ?

0 replies

neonbjb · 2022-06-21T04:31:06Z

neonbjb
Jun 21, 2022
Maintainer

Sure, those would work.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training issues #106

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 3 comments

{{title}}

{{title}}

{{title}}

Select a reply

Training issues #106

ZJ-CAI Jun 17, 2022

Replies: 3 comments

neonbjb Jun 17, 2022 Maintainer

ZJ-CAI Jun 21, 2022 Author

neonbjb Jun 21, 2022 Maintainer

ZJ-CAI
Jun 17, 2022

neonbjb
Jun 17, 2022
Maintainer

ZJ-CAI
Jun 21, 2022
Author

neonbjb
Jun 21, 2022
Maintainer