Face-Generation-from-Speech

Implementation Details - VoiceGAN

Overall architecture of our VoiceGAN:

Details

Face Embedding Extraction from Pre-trained DeepSphere Model
Kaldi VoxCeleb X-Vector Extraction
Joint Embedding Network using MLP
Conditional DC GAN for Image Synthesis with Scaling Loss

Datasets:

VGGFace2, Voxceleb2, Voxceleb1 (Used only for X-Vector training)

This work uses X-Vector Speaker Embeddings, with Deepsphere face Embeddings to train a joint embedding network using the N-Pair Loss. The obtained embeddings are used to generate face images conditioned on provided speaker embeddings shifted to a joint embedding space.

Preliminary Results

Example faces generated solely conditioned on speech input.

Name		Name	Last commit message	Last commit date
Latest commit History 104 Commits
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
model.png		model.png
result.png		result.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Face-Generation-from-Speech

Implementation Details - VoiceGAN

Overall architecture of our VoiceGAN:

Details

Datasets:

Preliminary Results

Additional Resources

Papers

Related Code Repositories

About

Releases

Packages

Contributors 2

Languages

License

AshwinRJ/Face-Generation-from-Voice

Folders and files

Latest commit

History

Repository files navigation

Face-Generation-from-Speech

Implementation Details - VoiceGAN

Overall architecture of our VoiceGAN:

Details

Datasets:

Preliminary Results

Additional Resources

Papers

Related Code Repositories

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages