1. Lip-syncing introduction

This work address the problem of lip-syncing a talking face video of an arbitray identity to match a target speech segment. Current works excel at producing accurate lip movements on a static image or on videosof specific people seen during the training phase. Wav2lip tackle this problem by learning from a powerful lip-sync discriminator, and the result show that the lip-sync accuracy of the generated videos using Wav2Lip model is almost as good as real synced videos.

2. How to use

2.1 Test

The pretrained model can be downloaded from here Runing the following command to complete the lip-syning task. The output is the synced videos.

cd applications
python tools/ \
    --face ../docs/imgs/mona7s.mp4 \
    --audio ../docs/imgs/guangquan.m4a \
    --outfile pp_guangquan_mona7s.mp4 \


  • face: path of the input image or video file including faces.
  • audio: path of the input audio file, format can be .wav.mp3, .m4a. It can be any file supported by FFMPEG containing audio data.
  • outfile: result video of wav2lip
  • face_enhancement: enhance the face, default is False

2.2 Training

  1. Our model are trained on LRS2. See here for a few suggestions regarding training on other datasets.

Preprocessed LRS2 dataset folder structure should be like:

preprocessed_root (lrs2_preprocessed)
├── list of folders
|    ├── Folders with five-digit numbered video IDs
|    │   ├── *.jpg
|    │   ├── audio.wav

Place the LRS2 filelists(train, val, test) .txt files in the filelists/ folder.

  1. You can eigher train the model without the additional visual quality discriminator or use the discriminator. For the former, run:
  • For single GPU:
python tools/ --config-file configs/wav2lip.yaml
  • For multiple GPUs:
python -m paddle.distributed.launch \
    tools/ \
    --config-file configs/wav2lip.yaml \

For the latter, run:

  • For single GPU:
python tools/ --config-file configs/wav2lip_hq.yaml
  • For multiple GPUs:
python -m paddle.distributed.launch \
    tools/ \
    --config-file configs/wav2lip_hq.yaml \

2.3 Model

Model Dataset BatchSize Inference speed Download
wa2lip_hq LRS2 1 0.2853s/image (GPU:P40) model


