Comparative Analysis of Audio Feature Extraction for Real-Time Talking Portrait Synthesis

(a) System architecture of the interactive child avatar, detailing the integration of key modules: (1) Listening, (2) STT, (3) Language Processing, (4) TTS, (5) AFE, (6) Frames Rendering, and (7) Audio Overlay. This setup simulates natural conversation, allowing the user to interact with the avatar as if communicating with a real person. (b) User interaction with the child avatar system.

Result based on RAD-NeRF

Result based on ER-NeRF

Audio Feature Extraction (AFE)

You should specify the type of audio feature when training and testing framework like: ER-Nerf and RAD-NeRF

DeepSpeech

To extract features with DeepSpeech, use the following command:

python AFEs/deepspeech_features/extract_ds_features.py --input data/<name>.wav # save to data/

HuBERT

To extract features with HuBERT, use the following command:

python AFEs/hubert.py --wav data/<name>.wav # save to data/<name>_hu.npy

Wav2Vec

To extract features with Wav2Vec, use the following command:

python AFEs/wav2vec.py --wav data/<name>.wav --save_feats # save to data/<name>_eo.npy

Whisper

To extract features with Whisper, use the following command:

python AFEs/whisper.py

Name		Name	Last commit message	Last commit date
Latest commit History 82 Commits
AFEs		AFEs
assets		assets
evaluations		evaluations
interactive_avatar		interactive_avatar
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Comparative Analysis of Audio Feature Extraction for Real-Time Talking Portrait Synthesis

Result based on RAD-NeRF

Result based on ER-NeRF

Audio Feature Extraction (AFE)

DeepSpeech

HuBERT

Wav2Vec

Whisper

About

Releases

Packages

Languages

License

pegahs1993/Whisper-AFE-TalkingHeadsGen

Folders and files

Latest commit

History

Repository files navigation

Comparative Analysis of Audio Feature Extraction for Real-Time Talking Portrait Synthesis

Result based on RAD-NeRF

Result based on ER-NeRF

Audio Feature Extraction (AFE)

DeepSpeech

HuBERT

Wav2Vec

Whisper

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages