(a) System architecture of the interactive child avatar, detailing the integration of key modules: (1) Listening, (2) STT, (3) Language Processing, (4) TTS, (5) AFE, (6) Frames Rendering, and (7) Audio Overlay. This setup simulates natural conversation, allowing the user to interact with the avatar as if communicating with a real person. (b) User interaction with the child avatar system.
You should specify the type of audio feature when training and testing framework like: ER-Nerf and RAD-NeRF
To extract features with DeepSpeech, use the following command:
python AFEs/deepspeech_features/extract_ds_features.py --input data/<name>.wav # save to data/
To extract features with HuBERT, use the following command:
python AFEs/hubert.py --wav data/<name>.wav # save to data/<name>_hu.npy
To extract features with Wav2Vec, use the following command:
python AFEs/wav2vec.py --wav data/<name>.wav --save_feats # save to data/<name>_eo.npy
To extract features with Whisper, use the following command:
python AFEs/whisper.py