Format of WAV samples #145

timboxyz · 2022-08-27T17:59:59Z

timboxyz
Aug 27, 2022

The README.md section on "Adding a new voice" states:-
Save the clips as a WAV file with floating point format and a 22,050 sample rate
However, the samples in:-
https://github.com/neonbjb/tortoise-tts/tree/main/tortoise/voices/....
seem to be in the more common and standard pcm_s16le (PCM signed 16-bit little-endian) format rather than floating point e.g. pcm_f32le (PCM 32-bit floating point little-endian).

So which is it?

neonbjb · 2022-08-29T04:54:05Z

neonbjb
Aug 29, 2022
Maintainer

It doesn't matter between those two formats. There are a few supported formats and a few that just don't work. Rather than enumerate all of the ones that work, I just stated fp32.

0 replies

ghost · 2022-09-15T17:48:36Z

ghost
Sep 15, 2022

Is it possible to train / output using 48Khz audio?

1 reply

neonbjb Sep 15, 2022
Maintainer

Not as it currently stands. I suspect that you could probably re-train the univnet vocoder to output higher quality audio using the same MEL inputs. It'd basically be training an upsampler and I think speech upsampling is pretty tractable problem.

mkirch · 2022-10-05T14:46:59Z

mkirch
Oct 5, 2022

This is something I wondered as well. Might be my lack of knowledge about audio processing, but what is the reasoning for inputting at 22050 and then generating the output at 24000?

Awesome library btw, thanks for the hard work putting this out there.

0 replies

neonbjb · 2022-10-06T18:14:14Z

neonbjb
Oct 6, 2022
Maintainer

Because midway through figuring out how to get this to work, I decided I wanted to use a vocoder. The best vocoder I could find was univnet, and the pretrained models use a 24khz sampling rate. I was unwilling to retrain my AR model to use this sample rate and I didn't have any interest in retraining univnet either, so I trained the diffusion model to bridge the gap.

If the system was retrained from scratch, I would probably just train it at 24khz.

1 reply

mkirch Oct 18, 2022

This is really helpful context, thanks for the response!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Format of WAV samples #145

{{title}}

Replies: 4 comments 2 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Format of WAV samples #145

timboxyz Aug 27, 2022

Replies: 4 comments · 2 replies

neonbjb Aug 29, 2022 Maintainer

ghost Sep 15, 2022

neonbjb Sep 15, 2022 Maintainer

mkirch Oct 5, 2022

neonbjb Oct 6, 2022 Maintainer

mkirch Oct 18, 2022

timboxyz
Aug 27, 2022

Replies: 4 comments 2 replies

neonbjb
Aug 29, 2022
Maintainer

ghost
Sep 15, 2022

neonbjb Sep 15, 2022
Maintainer

mkirch
Oct 5, 2022

neonbjb
Oct 6, 2022
Maintainer