2019
DOI: 10.48550/arxiv.1911.11502
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Hearing Lips: Improving Lip Reading by Distilling Speech Recognizers

Abstract: Lip reading has witnessed unparalleled development in recent years thanks to deep learning and the availability of largescale datasets. Despite the encouraging results achieved, the performance of lip reading, unfortunately, remains inferior to the one of its counterpart speech recognition, due to the ambiguous nature of its actuations that makes it challenging to extract discriminant features from the lip movement videos. In this paper, we propose a new method, termed as Lip by Speech (LIBS), of which the goa… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
7
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(8 citation statements)
references
References 16 publications
0
7
0
Order By: Relevance
“…Note that experiment on GRID dataset needs more training steps, since it is trained with its visual frontend together from scratch, different from experiments on LRS2 dataset. Moreover, the first 45k steps in warm-up stage for LRS2 are trained on LRS2-pretrain sub-dataset and all the left steps are trained on LRS2-main sub-dataset [1,2,33].…”
Section: Training Setupmentioning
confidence: 99%
See 4 more Smart Citations
“…Note that experiment on GRID dataset needs more training steps, since it is trained with its visual frontend together from scratch, different from experiments on LRS2 dataset. Moreover, the first 45k steps in warm-up stage for LRS2 are trained on LRS2-pretrain sub-dataset and all the left steps are trained on LRS2-main sub-dataset [1,2,33].…”
Section: Training Setupmentioning
confidence: 99%
“…The former surpasses the performance of all previous work on LRS2-BBC dataset by a large margin. To boost the performance of lipreading, Petridis et al [19] present a hybrid CTC/Attention architecture aiming to obtain the better alignment than attention-only mechanism, Zhao et al [33] provide the idea that transferring knowledge from audio-speech recognition model to lipreading model by distillation.…”
Section: Related Work 21 Autoregressive Deep Lipreadingmentioning
confidence: 99%
See 3 more Smart Citations