2024
DOI: 10.1609/aaai.v38i17.29926
|View full text |Cite
|
Sign up to set email alerts
|

Visual Hallucination Elevates Speech Recognition

Fang Zhang,
Yongxin Zhu,
Xiangxiang Wang
et al.

Abstract: Due to the detrimental impact of noise on the conventional audio speech recognition (ASR) task, audio-visual speech recognition~(AVSR) has been proposed by incorporating both audio and visual video signals. Although existing methods have demonstrated that the aligned visual input of lip movements can enhance the robustness of AVSR systems against noise, the paired videos are not always available during inference, leading to the problem of the missing visual modality, which restricts their practicality in rea… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

0
0
0

Publication Types

Select...

Relationship

0
0

Authors

Journals

citations
Cited by 0 publications
references
References 26 publications
0
0
0
Order By: Relevance

No citations

Set email alert for when this publication receives citations?