-
Notifications
You must be signed in to change notification settings - Fork 139
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Example for raw audio #21
Comments
Why not take a look at AudioDec and Descript-Audio-Codec? They are open source. |
Thank you @UkiTenzai . I checked the GitHub pages for both (https://github.com/facebookresearch/AudioDec and https://github.com/descriptinc/descript-audio-codec) and neither seems to do vocal cloning, i.e. voice neural transfer, right? That's what I would like to do with the VQ-VAE. |
Sorry, AudioDec and DAC are for compression. You can try SpeechTokenizor[https://github.com/ZhangXInFD/SpeechTokenizer/], which utilize a VQVAE and can be used for zero-shot VC. Altugh there are many similar VQVAE that surpass it, but they all basically improve on it. It was necessary to learn this one first. |
Thank you. I checked the repo and it doesn't mention vocal cloning either, and an online search for SpeechTokenizer and vocal cloning did not show any applications, so I wouldn't know where to start. Could you please point an application or sample code using SpeechTokenizer for neural voice transfer? |
Hello, and thanks for the code! I want to replicate the audio results from the paper, but the DeepMind repo does not have a VQ-VAE example for audio (see google-deepmind/sonnet#141 ), and it seems quite different from the one for CIFAR:
Could you please include an example of using your code for audio?
The text was updated successfully, but these errors were encountered: