- Face Embedding Extraction from Pre-trained DeepSphere Model
- Kaldi VoxCeleb X-Vector Extraction
- Joint Embedding Network using MLP
- Conditional DC GAN for Image Synthesis with Scaling Loss
VGGFace2, Voxceleb2, Voxceleb1 (Used only for X-Vector training)
- This work uses X-Vector Speaker Embeddings, with Deepsphere face Embeddings to train a joint embedding network using the N-Pair Loss. The obtained embeddings are used to generate face images conditioned on provided speaker embeddings shifted to a joint embedding space.
Example faces generated solely conditioned on speech input.