16bit float? #266

vurt72 · 2023-01-24T20:05:40Z

vurt72
Jan 24, 2023

How can i save my .wav's as 16bit float? i've never seen this format before, not something that is available in any audio program that i've ever ran into, 64 and 32 are standard.
Anyone found a converter?

Cheers

Answered by sbersier

Jan 26, 2023

Ah! Now, I think I understand your question... You were referring to the audio samples needed in order to generate the voice latents, right? (cf. "Adding a new voice" paragraph on the landing page).
The bit depth and format (s16 ,f32) and the audio rate don't matter as long as it is wav or mp3. The "load_audio" function (in tortoise-tts/tortoise/utils/audio.py between lines 29 and 55) will handle it and resample it at 22050 Hz. You can give 48kHz f32le wav or 96kHz mp3 files and it should still work.

View full answer

sbersier · 2023-01-26T12:08:02Z

sbersier
Jan 26, 2023

The resulting audio is indeed encoded as mono, pcm_f32le at 24000 Hz. Which is not a real problem nowadays since most players (VLC, MPlayer, ...) can play it.
But you can re-encode it with, for example, FFmpeg (freely available for linux, Mac and Windows).
If you want a CD format (16 bits / 44100 Hz):
ffmpeg -i <the_produced_audio_file.wav> -ar 44100 -q:a 0 <new_filename.wav>

You can check the result with ffprobe <audio_file> (check for Stream #0:0:)

0 replies

vurt72 · 2023-01-26T13:31:32Z

vurt72
Jan 26, 2023
Author

Ah, resulting, i was referring to "Save the clips as a WAV file with floating point format and a 22,050 sample rate."
I could swear it said "16 bit float", or it says that on another page i was on.. hmm. well anyways, this does not seem to matter from more experiments, i save my voice .wav's as PCM and it seems to handle them fine. But maybe there's an advantage of using float instead, like rendering time, didn't compare.

Really amazing results when it works, been using and having fun with text-to-speech since the C64 days! What at time it has been the last year with all the new AI stuff! :)

0 replies

sbersier · 2023-01-26T14:09:02Z

sbersier
Jan 26, 2023

Yeah, the results are absolutely great and things are moving really fast!

0 replies

sbersier · 2023-01-26T14:49:37Z

sbersier
Jan 26, 2023

Ah! Now, I think I understand your question... You were referring to the audio samples needed in order to generate the voice latents, right? (cf. "Adding a new voice" paragraph on the landing page).
The bit depth and format (s16 ,f32) and the audio rate don't matter as long as it is wav or mp3. The "load_audio" function (in tortoise-tts/tortoise/utils/audio.py between lines 29 and 55) will handle it and resample it at 22050 Hz. You can give 48kHz f32le wav or 96kHz mp3 files and it should still work.

4 replies

vurt72 Jan 26, 2023
Author

Thanks! Good to know!

FurkanGozukara Feb 11, 2023

@sbersier i dont want it to convert

so 32 fp 22050 wav wouldnt be converted is that native support?

sbersier Feb 11, 2023

If you look into tortoise-tts/tortoise/utils/audio.py in lines 16-55:
You can see that tortoise accepts: int16, int32, float16, float32 (lines 16-26) , WAV and mp3 files (lines 30-36) . If it is not 22050, it will resample it at that rate (line 47). So you should be OK, nothing to do.

sbersier Feb 11, 2023

Note: line26 reads:
return (torch.FloatTensor(data.astype(np.float32)) / norm_fix, sampling_rate)

So, float32 seems to be what TorToiSe will use ultimately. But I'am not the author of tortoise...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

16bit float? #266

{{title}}

Replies: 4 comments 4 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

16bit float? #266

vurt72 Jan 24, 2023

Replies: 4 comments · 4 replies

sbersier Jan 26, 2023

vurt72 Jan 26, 2023 Author

sbersier Jan 26, 2023

sbersier Jan 26, 2023

vurt72 Jan 26, 2023 Author

FurkanGozukara Feb 11, 2023

sbersier Feb 11, 2023

sbersier Feb 11, 2023

vurt72
Jan 24, 2023

Replies: 4 comments 4 replies

sbersier
Jan 26, 2023

vurt72
Jan 26, 2023
Author

sbersier
Jan 26, 2023

sbersier
Jan 26, 2023

vurt72 Jan 26, 2023
Author