GitHub - IEEE-NITK/Voice-Swapper: A voice-changing dictaphone

Voice-Swapper

Real-time voice conversion using GANs implemented on RPi4.
Explore the dataset »

Table of Contents

About The Project
Objectives
Scope
Roadmap
Contact
Acknowledgments

About The Project

A voice-changing dictaphone

Voice-Swapper is a dictaphone that will be used to convert the user’s voice(source) to a target voice without any loss of linguistic information. VC is useful in many applications, such as customizing audio book and avatar voices, dubbing, voice modification, voice restoration after surgery, and cloning of voices of historical persons. VC models are primarily implemented with Generative Adversarial Networks(GANs) which provide promising results by generating the user fed-in statements in the target’s voice. We aim to build these models from scratch and implement them on a NVIDIA Jetson, a commonly used, powerful device, for AI applications. This project would be an inter-sig project between Diode and CompSoc.

Use the README.md to get started.

(back to top)

Objectives

To be the first to implement CycleGAN in Tensorflow 2.0 (NO existing implementation of the same)
To train the CycleGAN model on the "Trump" and "Peter Griffin" datasets.
To implement these models on a web application.
To perform voice swapping(conversion) in real-time.

(back to top)

Scope

If time permits, we aim to propose a novel model based on the survey/summary of model performances in VCC2016 and write a research paper based on its performance compared to the existing models.

Click here for the complete proposal.

(back to top)

Model Architecture

CycleGAN

One of the important characteristics of speech is that it has sequential and hierarchical structures, e.g., voiced or unvoiced segments and phonemes or morphemes. An effective way to represent such structures would be to use an RNN, but it is computationally demanding due to the difficulty of parallel implementations.

Instead, we configure a CycleGAN using gated CNNs that not only allow parallelization over sequential data but also achieve state-of-the-art in speech modeling. In a gated CNN, gated linear units (GLUs) are used as an activation function. A GLU is a data-driven activation function, and the gated mechanism allows the information to be selectively propagated depending on the previous layer states.

MelGAN

We propose MelGAN-VC, a voice conversion method that relies on non-parallel speech data and is able to convert audio signals of arbitrary length from a source voice to a target voice. We firstly compute spectrograms from waveform data and then perform a domain translation using a Generative Adversarial Network (GAN) architecture. An additional siamese network helps preserving speech information in the translation process, without sacrificing the ability to flexibly model the style of the target speaker.

(back to top)

Roadmap

(back to top)

Contact

Palgun N P - [email protected]

Harish Gumnur - [email protected]

Nikhil P Reddy - [email protected]

Project Link: https://github.com/IEEE-NITK/Voice-Swapper

(back to top)

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
images		images
preprocessed_trump		preprocessed_trump
tf_2_version		tf_2_version
MELGAN_VC.ipynb		MELGAN_VC.ipynb
README.md		README.md
discriminator.py		discriminator.py
environment.yml		environment.yml
preprocessed_SF1TM2.tar.bz2		preprocessed_SF1TM2.tar.bz2
unet_generator.py		unet_generator.py
unet_parts.py		unet_parts.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Voice-Swapper

About The Project

Objectives

Scope

Model Architecture

CycleGAN

MelGAN

Roadmap

Contact

About

Releases

Packages

Contributors 2

Languages

IEEE-NITK/Voice-Swapper

Folders and files

Latest commit

History

Repository files navigation

Voice-Swapper

About The Project

Objectives

Scope

Model Architecture

CycleGAN

MelGAN

Roadmap

Contact

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages