2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022
DOI: 10.1109/cvpr52688.2022.00510
|View full text |Cite
|
Sign up to set email alerts
|

Sub-word Level Lip Reading With Visual Attention

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
23
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
3
2
2

Relationship

0
7

Authors

Journals

citations
Cited by 51 publications
(36 citation statements)
references
References 47 publications
0
23
0
Order By: Relevance
“…Random cropping with size 88×88 and horizontal flipping are also performed for each video during training. We also follow Prajwal et al [37] using central crop with horizontal flipping at test time for visual-only experiments.…”
Section: Implementation Detailsmentioning
confidence: 99%
See 2 more Smart Citations
“…Random cropping with size 88×88 and horizontal flipping are also performed for each video during training. We also follow Prajwal et al [37] using central crop with horizontal flipping at test time for visual-only experiments.…”
Section: Implementation Detailsmentioning
confidence: 99%
“…Other works focus on Visual Speech Recognition (VSR), only using lip movements to transcribe spoken language into text [4,9,48,3,49,37,30]. An important line of research is the use of cross-modal distillation.…”
mentioning
confidence: 99%
See 1 more Smart Citation
“…Audio and visual speeches are two separate modalities that convey speech content. Numerous works [42,12,1,2,44,24,26] have explored ways to extract information from speech using these modalities. Speech recognition [42,6,21] is widely used in online meetings and social applications to recognize speech content.…”
Section: Related Work 21 Audio-visual Speechmentioning
confidence: 99%
“…Keyword spotting [5,49,28] is employed in short video applications to quickly retrieve relevant content. Additionally, in noisy scenarios, relevant speech tasks [13,20,44,39] rely on visual speech to avoid interference from surrounding speech and background noise. Despite the growing interest in speech tasks that rely on visual speech, researches [54,57] on visual speech translation are limited and lacks validation due to the lack of multilingual audio-visual speech transcription datasets.…”
Section: Related Work 21 Audio-visual Speechmentioning
confidence: 99%