Skip to content

16bit float? #266

Answered by sbersier
vurt72 asked this question in Q&A
Jan 24, 2023 · 4 comments · 4 replies
Discussion options

You must be logged in to vote

Ah! Now, I think I understand your question... You were referring to the audio samples needed in order to generate the voice latents, right? (cf. "Adding a new voice" paragraph on the landing page).
The bit depth and format (s16 ,f32) and the audio rate don't matter as long as it is wav or mp3. The "load_audio" function (in tortoise-tts/tortoise/utils/audio.py between lines 29 and 55) will handle it and resample it at 22050 Hz. You can give 48kHz f32le wav or 96kHz mp3 files and it should still work.

Replies: 4 comments 4 replies

Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
4 replies
@vurt72
Comment options

@FurkanGozukara
Comment options

@sbersier
Comment options

@sbersier
Comment options

Answer selected by vurt72
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
3 participants