Training Model #4024
Unanswered
blessziamah
asked this question in
General Q&A
Training Model
#4024
Replies: 1 comment
-
Training a text-to-speech model typically requires both audio recordings and corresponding transcripts (text) to learn the mapping between spoken sounds and written language. If you only have voice recordings without transcripts, you would face a significant challenge because the model needs the text to understand what to synthesize. You can use ASR models like whisper to generate transcripts for your audios. Please mark it answered if your doubt is resolved. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Is it possible for me to train a new model with a dataset that contains only voice recordings? No transcript
Beta Was this translation helpful? Give feedback.
All reactions