2022 IEEE International Conference on Image Processing (ICIP) 2022
DOI: 10.1109/icip46576.2022.9897235
|View full text |Cite
|
Sign up to set email alerts
|

Learning Contextually Fused Audio-Visual Representations For Audio-Visual Speech Recognition

Abstract: With the advance in self-supervised learning for audio and visual modalities, it has become possible to learn a robust audio-visual speech representation. This would be beneficial for improving the audio-visual speech recognition (AVSR) performance, as the multimodal inputs contain more fruitful information in principle. In this paper, based on existing self-supervised representation learning methods for audio modality, we therefore propose an audio-visual representation learning approach. The proposed approac… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

0
0
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
2
2

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
references
References 35 publications
(48 reference statements)
0
0
0
Order By: Relevance