-
How can i save my .wav's as 16bit float? i've never seen this format before, not something that is available in any audio program that i've ever ran into, 64 and 32 are standard. Cheers |
Beta Was this translation helpful? Give feedback.
Replies: 4 comments 4 replies
-
The resulting audio is indeed encoded as mono, pcm_f32le at 24000 Hz. Which is not a real problem nowadays since most players (VLC, MPlayer, ...) can play it. You can check the result with |
Beta Was this translation helpful? Give feedback.
-
Ah, resulting, i was referring to "Save the clips as a WAV file with floating point format and a 22,050 sample rate." Really amazing results when it works, been using and having fun with text-to-speech since the C64 days! What at time it has been the last year with all the new AI stuff! :) |
Beta Was this translation helpful? Give feedback.
-
Yeah, the results are absolutely great and things are moving really fast! |
Beta Was this translation helpful? Give feedback.
-
Ah! Now, I think I understand your question... You were referring to the audio samples needed in order to generate the voice latents, right? (cf. "Adding a new voice" paragraph on the landing page). |
Beta Was this translation helpful? Give feedback.
Ah! Now, I think I understand your question... You were referring to the audio samples needed in order to generate the voice latents, right? (cf. "Adding a new voice" paragraph on the landing page).
The bit depth and format (s16 ,f32) and the audio rate don't matter as long as it is wav or mp3. The "load_audio" function (in tortoise-tts/tortoise/utils/audio.py between lines 29 and 55) will handle it and resample it at 22050 Hz. You can give 48kHz f32le wav or 96kHz mp3 files and it should still work.