Example for raw audio #21

mm3509 · 2023-12-03T17:21:34Z

Hello, and thanks for the code! I want to replicate the audio results from the paper, but the DeepMind repo does not have a VQ-VAE example for audio (see google-deepmind/sonnet#141 ), and it seems quite different from the one for CIFAR:

We train a VQ-VAE where the encoder has 6 strided convolutions with stride 2 and window-size 4. This yields a latent space 64x smaller than the original waveform. The latents consist of one feature map and the discrete space is 512-dimensional.

Could you please include an example of using your code for audio?

UkiTenzai · 2025-03-13T13:53:32Z

Why not take a look at AudioDec and Descript-Audio-Codec? They are open source.

mm3509 · 2025-03-14T12:01:03Z

Thank you @UkiTenzai . I checked the GitHub pages for both (https://github.com/facebookresearch/AudioDec and https://github.com/descriptinc/descript-audio-codec) and neither seems to do vocal cloning, i.e. voice neural transfer, right? That's what I would like to do with the VQ-VAE.

UkiTenzai · 2025-03-15T08:23:26Z

Thank you @UkiTenzai . I checked the GitHub pages for both (https://github.com/facebookresearch/AudioDec and https://github.com/descriptinc/descript-audio-codec) and neither seems to do vocal cloning, i.e. voice neural transfer, right? That's what I would like to do with the VQ-VAE.

Sorry, AudioDec and DAC are for compression. You can try SpeechTokenizor[https://github.com/ZhangXInFD/SpeechTokenizer/], which utilize a VQVAE and can be used for zero-shot VC. Altugh there are many similar VQVAE that surpass it, but they all basically improve on it. It was necessary to learn this one first.

mm3509 · 2025-03-15T16:49:57Z

Thank you. I checked the repo and it doesn't mention vocal cloning either, and an online search for SpeechTokenizer and vocal cloning did not show any applications, so I wouldn't know where to start. Could you please point an application or sample code using SpeechTokenizer for neural voice transfer?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Example for raw audio #21

Example for raw audio #21

mm3509 commented Dec 3, 2023

UkiTenzai commented Mar 13, 2025

mm3509 commented Mar 14, 2025

UkiTenzai commented Mar 15, 2025 •

edited

Loading

mm3509 commented Mar 15, 2025

Example for raw audio #21

Example for raw audio #21

Comments

mm3509 commented Dec 3, 2023

UkiTenzai commented Mar 13, 2025

mm3509 commented Mar 14, 2025

UkiTenzai commented Mar 15, 2025 • edited Loading

mm3509 commented Mar 15, 2025

UkiTenzai commented Mar 15, 2025 •

edited

Loading