Skip to content

Latest commit

 

History

History
68 lines (68 loc) · 2.57 KB

2025-01-14-zhao25a.md

File metadata and controls

68 lines (68 loc) · 2.57 KB
title booktitle year volume series month publisher pdf url openreview abstract layout issn id tex_title firstpage lastpage page order cycles bibtex_editor editor bibtex_author author date address container-title genre issued extras
AMG-AVSR: Adaptive Modality Guidance for Audio-Visual Speech Recognition via Progressive Feature Enhancement
Proceedings of the 16th Asian Conference on Machine Learning
2025
260
Proceedings of Machine Learning Research
0
PMLR
sXkQhSX3Ib
Audio-Visual Speech Recognition (AVSR) is a task that identifies spoken words by analyzing both lip movements and auditory signals. Compared to Automatic Speech Recognition (ASR), AVSR demonstrates greater robustness in noisy environments due to the support of dual modalities. However, the inherent differences between these modalities present a challenge: effectively accounting for their disparities and leveraging their complementary information to extract useful information for AVSR. To address this, we propose the AMG-AVSR model, which utilizes a two-stage curriculum learning strategy and incorporates a feature compression and recovery mechanism. By leveraging the characteristics of different modalities in various scenarios to guide each other, the model extracts refined features from audio-visual data, thereby enhancing recognition performance in both clean and noisy environments. Compared to the baseline model AV-HuBERT, AMG-AVSR demonstrates superior performance on the LRS2 dataset in both noisy and clean environments. AMG-AVSR achieves a word error rate (WER) of 2.9% under clean speech conditions. In various noisy conditions, AMG-AVSR shows a significant reduction in WER compared to previous methods.
inproceedings
2640-3498
zhao25a
{AMG-AVSR}: {A}daptive Modality Guidance for Audio-Visual Speech Recognition via Progressive Feature Enhancement
952
967
952-967
952
false
Nguyen, Vu and Lin, Hsuan-Tien
given family
Vu
Nguyen
given family
Hsuan-Tien
Lin
Zhao, Zhishuo and Guo, Dongyue and Ou, Wenjie and Liu, Hong and Lin, Yi
given family
Zhishuo
Zhao
given family
Dongyue
Guo
given family
Wenjie
Ou
given family
Hong
Liu
given family
Yi
Lin
2025-01-14
Proceedings of the 16th Asian Conference on Machine Learning
inproceedings
date-parts
2025
1
14