title

booktitle

year

volume

series

month

publisher

pdf

url

openreview

abstract

layout

issn

id

tex_title

firstpage

lastpage

page

order

cycles

bibtex_editor

editor

bibtex_author

author

date

address

container-title

genre

issued

extras

AMG-AVSR: Adaptive Modality Guidance for Audio-Visual Speech Recognition via Progressive Feature Enhancement

Proceedings of the 16th Asian Conference on Machine Learning

2025

260

Proceedings of Machine Learning Research

0

PMLR

https://raw.githubusercontent.com/mlresearch/v260/main/assets/zhao25a/zhao25a.pdf

https://proceedings.mlr.press/v260/zhao25a.html

sXkQhSX3Ib

Audio-Visual Speech Recognition (AVSR) is a task that identifies spoken words by analyzing both lip movements and auditory signals. Compared to Automatic Speech Recognition (ASR), AVSR demonstrates greater robustness in noisy environments due to the support of dual modalities. However, the inherent differences between these modalities present a challenge: effectively accounting for their disparities and leveraging their complementary information to extract useful information for AVSR. To address this, we propose the AMG-AVSR model, which utilizes a two-stage curriculum learning strategy and incorporates a feature compression and recovery mechanism. By leveraging the characteristics of different modalities in various scenarios to guide each other, the model extracts refined features from audio-visual data, thereby enhancing recognition performance in both clean and noisy environments. Compared to the baseline model AV-HuBERT, AMG-AVSR demonstrates superior performance on the LRS2 dataset in both noisy and clean environments. AMG-AVSR achieves a word error rate (WER) of 2.9% under clean speech conditions. In various noisy conditions, AMG-AVSR shows a significant reduction in WER compared to previous methods.

inproceedings

2640-3498

zhao25a

{AMG-AVSR}: {A}daptive Modality Guidance for Audio-Visual Speech Recognition via Progressive Feature Enhancement

952

967

952-967

952

false

Nguyen, Vu and Lin, Hsuan-Tien

given	family
Vu	Nguyen

given	family
Hsuan-Tien	Lin

Zhao, Zhishuo and Guo, Dongyue and Ou, Wenjie and Liu, Hong and Lin, Yi

given	family
Zhishuo	Zhao

given	family
Dongyue	Guo

given	family
Wenjie	Ou

given	family
Hong	Liu

given	family
Yi	Lin

2025-01-14

Proceedings of the 16th Asian Conference on Machine Learning

inproceedings

date-parts

2025

1

14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2025-01-14-zhao25a.md

2025-01-14-zhao25a.md

Files

2025-01-14-zhao25a.md

Latest commit

History

2025-01-14-zhao25a.md

File metadata and controls