Replies: 3 comments
-
Hey, |
Beta Was this translation helpful? Give feedback.
-
Thank you for your reply. |
Beta Was this translation helpful? Give feedback.
-
Sure, those would work. |
Beta Was this translation helpful? Give feedback.
-
Hello Mr.Betker. I`d like to train this model but meet with a problem.
I input the 3 mel-spectrograms of the target voice and the text, and then i get the output which is used to calculate loss with target. For instance, the input is the mel_spectrograms of speaker Angies speechs(speechA, speechB & speechC) and a text of Angies speechD. The target would be the mel_spectrogram of speaker Angie`s speechD.
Unfortunately, after the whole processes(autoregressive, diffusion, clvp, vocoder), the shape of the output is "torch.Size([1, 557056])". Whilst the shape of target mel_spectrogram is "torch.Size([1, 95520])". Thus, i am confused how to calculate the loss between the output and target?
Besides, is there any solution to make their shapes be the same?
Beta Was this translation helpful? Give feedback.
All reactions