-
Notifications
You must be signed in to change notification settings - Fork 112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multispeaker dataset #44
Comments
It's all learned implicitly. There's no fundamental difference between a single-speaker and a multi-speaker dataset apart from the variance of the distribution of conditioning latents && predicted audio. It is perhaps better to actually model each wav file as an individual speaker -- each speaker is a point on latent space, and there are general clusters corresponding to individual characters, and perhaps you could circle each cluster and label it as the broad space of a single speaker's voice, but in practice there ought to be overlaps for a sufficiently diverse multispeaker dataset |
I think there is a potential idea to be applied here, actually -- you could try to apply the exact same conditioning latent for EVERY line said by a specific character. But that would require additional code and stuff "Multispeaker" in the current case just means exposing the model to more kinds of speakers during training. Ideally the model would learn to clone all of them, conditionally on the input zero-shot latent; in practice it underfits severely with the short number of epoches available in fine-tuning. I suspect a much much longer training run might teach the model to correctly remember all speakers, but it might also just lead to terrible overfitting on the existing lines |
Hi and thanks for your work. |
Hi, is it possible to train this model for a multispeaker dataset? if there is then, can u give the information in detail? Thank you in advance. |
I try to train with multispeaker dataset but it have so thing wrong |
What should a dataset for Multispeaker look like?
Should each speaker have an identifier at the end, for example:
wavs/1.wav|transcription.
or
wavs/1.wav|transcription.|1
or
wavs/1.wav|transcription.|speaker_name
The text was updated successfully, but these errors were encountered: