SelfTalk: A Self-Supervised Commutative Training Diagram to Comprehend 3D Talking Faces [ACM MM 2023]
Official PyTorch implementation for the paper:
SelfTalk: A Self-Supervised Commutative Training Diagram to Comprehend 3D Talking Faces, ACM MM 2023.
Ziqiao Peng, Yihao Luo, Yue Shi, Hao Xu, Xiangyu Zhu, Hongyan Liu, Jun He, Zhaoxin Fan
Arxiv | Project Page | License
Given a speech signal as input, our framework can generate realistic 3D talking faces showing comprehensibility by recovering coherent textual information through the lip-reading interpreter and the speech recognizer.
- Linux
- Python 3.6+
- Pytorch 1.12.1
- CUDA 11.3
- ffmpeg
- MPI-IS/mesh
Clone the repo:
git clone https://github.com/psyai-net/SelfTalk_release.git
cd SelfTalk_release
Create conda environment:
conda create -n selftalk python=3.8.8
conda activate selftalk
pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu113
pip install -r requirements.txt
Request the VOCASET data from https://voca.is.tue.mpg.de/. Place the downloaded files data_verts.npy
, raw_audio_fixed.pkl
, templates.pkl
and subj_seq_to_idx.pkl
in the folder vocaset/
. Download "FLAME_sample.ply" from voca and put it in vocaset/
. Read the vertices/audio data and convert them to .npy/.wav files stored in vocaset/vertices_npy
and vocaset/wav
:
cd vocaset
python process_voca_data.py
Follow the BIWI/README.md
to preprocess BIWI dataset and put .npy/.wav files into BIWI/vertices_npy
and BIWI/wav
, and the templates.pkl
into BIWI/
.
Download the pretrained models from BIWI.pth and vocaset.pth. Put the pretrained models under BIWI
and VOCASET
folders, respectively. Given the audio signal,
- to animate a mesh in FLAME topology, run:
python demo_voca.py --wav_path "demo/wav/test.wav" --subject FaceTalk_170908_03277_TA
- to animate a mesh in BIWI topology, run:
This script will automatically generate the rendered videos in the
python demo_BIWI.py --wav_path "demo/wav/test.wav" --subject M1
demo/output
folder. You can also put your own test audio file (.wav format) under thedemo/wav
folder and specify the argument--wav_path "demo/wav/test.wav"
accordingly.
-
Read the vertices/audio data and convert them to .npy/.wav files stored in
vocaset/vertices_npy
andvocaset/wav
:cd VOCASET python process_voca_data.py
-
To train the model on VOCASET, run:
python main.py --dataset vocaset --vertice_dim 15069 --feature_dim 512 --period 30 --train_subjects "FaceTalk_170728_03272_TA FaceTalk_170904_00128_TA FaceTalk_170725_00137_TA FaceTalk_170915_00223_TA FaceTalk_170811_03274_TA FaceTalk_170913_03279_TA FaceTalk_170904_03276_TA FaceTalk_170912_03278_TA" --val_subjects "FaceTalk_170811_03275_TA FaceTalk_170908_03277_TA" --test_subjects "FaceTalk_170809_00138_TA FaceTalk_170731_00024_TA"
-
To test the model on VOCASET, run:
python test.py --dataset vocaset --vertice_dim 15069 --feature_dim 512 --period 30 --max_epoch 100 --train_subjects "FaceTalk_170728_03272_TA FaceTalk_170904_00128_TA FaceTalk_170725_00137_TA FaceTalk_170915_00223_TA FaceTalk_170811_03274_TA FaceTalk_170913_03279_TA FaceTalk_170904_03276_TA FaceTalk_170912_03278_TA" --val_subjects "FaceTalk_170811_03275_TA FaceTalk_170908_03277_TA" --test_subjects "FaceTalk_170809_00138_TA FaceTalk_170731_00024_TA"
The results and the trained models will be saved to
vocaset/result
andvocaset/save
.
-
To visualize the results, run:
python render.py --dataset vocaset --vertice_dim 15069 --fps 30
You can find the outputs in the
vocaset/output
folder.
- Follow the
BIWI/README.md
to preprocess BIWI dataset.
-
To train the model on BIWI, run:
python main.py --dataset BIWI --vertice_dim 70110 --feature_dim 1024 --period 25 --train_subjects "F2 F3 F4 M3 M4 M5" --val_subjects "F2 F3 F4 M3 M4 M5" --test_subjects "F2 F3 F4 M3 M4 M5"
-
To test the model on BIWI, run:
python test.py --dataset BIWI --vertice_dim 70110 --feature_dim 1024 --period 25 --max_epoch 100 --train_subjects "F2 F3 F4 M3 M4 M5" --val_subjects "F2 F3 F4 M3 M4 M5" --test_subjects "F2 F3 F4 M3 M4 M5"
The results will be available in the
BIWI/result
folder. The trained models will be saved in theBIWI/save
folder.
-
To visualize the results, run:
python render.py --dataset BIWI --vertice_dim 70110 --fps 25
The rendered videos will be available in the
BIWI/output
folder.
-
Create the dataset directory
<dataset_dir>
inSelfTalk_release
directory. -
Place your vertices data (.npy format) and audio data (.wav format) in
<dataset_dir>/vertices_npy
and<dataset_dir>/wav
folders, respectively. -
Save the templates of all subjects to a
templates.pkl
file and put it in<dataset_dir>
, as done for BIWI and vocaset. Export an arbitary template to .ply format and put it in<dataset_dir>/templates/
.
-
Create the train, val and test splits by specifying the arguments
--train_subjects
,--val_subjects
and--test_subjects
inmain.py
. -
Train a SelfTalk model on your own dataset by specifying the arguments
--dataset
and--vertice_dim
(number of vertices in your mesh * 3) inmain.py
. You might need to adjust--feature_dim
and--period
to your dataset. Runmain.py
. -
The results and models will be saved to
<dataset_dir>/result
and<dataset_dir>/save
.
- Specify the arguments
--dataset
,--vertice_dim
and--fps
inrender.py
. Runrender.py
to visualize the results. The rendered videos will be saved to<dataset_dir>/output
.
If you find this work useful for your research, please cite our paper:
@inproceedings{peng2023selftalk,
title={SelfTalk: A Self-Supervised Commutative Training Diagram to Comprehend 3D Talking Faces},
author={Ziqiao Peng and Yihao Luo and Yue Shi and Hao Xu and Xiangyu Zhu and Hongyan Liu and Jun He and Zhaoxin Fan},
journal={arXiv preprint arXiv:2306.10799},
year={2023}
}
Here are some great resources we benefit:
- Faceformer for pipeline and readme
- CoderTalker for BIWI dataset preprocessing
- FaceXHuBERT for BIWI audio processing
- B3D(AC)2 and VOCASET for dataset
- Wav2Vec2 for audio encoder
- MPI-IS/mesh for mesh processing
- VOCA/rendering for rendering
For research purpose, please contact [email protected]
For commercial licensing, please contact [email protected]
This project is licensed under the Creative Commons Attribution-NonCommercial 4.0 International License. Please read the LICENSE file for more information.
We invite you to join Psyche AI Inc to conduct cutting-edge research and business implementation together. At Psyche AI Inc, we are committed to pushing the boundaries of what's possible in the fields of artificial intelligence and computer vision, especially their applications in avatars. As a member of our team, you will have the opportunity to collaborate with talented individuals, innovate new ideas, and contribute to projects that have a real-world impact.
If you are passionate about working on the forefront of technology and making a difference, we would love to hear from you. Please visit our website at Psyche AI Inc to learn more about us and to apply for open positions. You can also contact us by [email protected].
Let's shape the future together!!