Skip to content

pegahs1993/Whisper-AFE-TalkingHeadsGen

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

82 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Comparative Analysis of Audio Feature Extraction for Real-Time Talking Portrait Synthesis

arXiv

Diagram

(a) System architecture of the interactive child avatar, detailing the integration of key modules: (1) Listening, (2) STT, (3) Language Processing, (4) TTS, (5) AFE, (6) Frames Rendering, and (7) Audio Overlay. This setup simulates natural conversation, allowing the user to interact with the avatar as if communicating with a real person. (b) User interaction with the child avatar system.

Result based on RAD-NeRF

Watch the first video

Result based on ER-NeRF

Watch the second video

Audio Feature Extraction (AFE)

You should specify the type of audio feature when training and testing framework like: ER-Nerf and RAD-NeRF

DeepSpeech

To extract features with DeepSpeech, use the following command:

python AFEs/deepspeech_features/extract_ds_features.py --input data/<name>.wav # save to data/

HuBERT

To extract features with HuBERT, use the following command:

python AFEs/hubert.py --wav data/<name>.wav # save to data/<name>_hu.npy

Wav2Vec

To extract features with Wav2Vec, use the following command:

python AFEs/wav2vec.py --wav data/<name>.wav --save_feats # save to data/<name>_eo.npy

Whisper

To extract features with Whisper, use the following command:

python AFEs/whisper.py 

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published