Personalized Portrait Generator: Realistic Speech-to-Portrait Generation via Face Prior Guided Diffusion Model
[1] A novel speech-condition LDM, namely Speech-Conditioned portrait generation with Face Prior guidance (SCFP), to formulate the LDM as a personalized portrait generator.
[2] A sample-adaptive weighted module is designed to dynamically weighted the face prior, emphasizing the subtle personalized variance conditioned in speech for further identity preservation.
[3] We introduce a pre-training procedure that combines contrastive learning with face reconstruction, which aims to align speech and face representations while simultaneously reconstructing facial structures.