2014 IEEE International Conference on Image Processing (ICIP) 2014
DOI: 10.1109/icip.2014.7025274
|View full text |Cite
|
Sign up to set email alerts
|

Resolution limits on visual speech recognition

Abstract: Visual-only speech recognition is dependent upon a number of factors that can be difficult to control, such as: lighting; identity; motion; emotion and expression. But some factors, such as video resolution are controllable, so it is surprising that there is not yet a systematic study of the effect of resolution on lip-reading. Here we use a new data set, the Rosetta Raven data, to train and test recognizers so we can measure the affect of video resolution on recognition accuracy. We conclude that, contrary to… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
14
0

Year Published

2017
2017
2023
2023

Publication Types

Select...
5
3

Relationship

4
4

Authors

Journals

citations
Cited by 18 publications
(14 citation statements)
references
References 11 publications
0
14
0
Order By: Relevance
“…For each frame a single feature vector is extracted which is the concatenation of the shape and appearance parameters. There are many examples of speaker-dependent AAMs improving MLR [2].…”
Section: A Featuresmentioning
confidence: 99%
“…For each frame a single feature vector is extracted which is the concatenation of the shape and appearance parameters. There are many examples of speaker-dependent AAMs improving MLR [2].…”
Section: A Featuresmentioning
confidence: 99%
“…Speaker appearance, or identity, is known to be important in the recognition of speech from visual-only information (lipreading) [33], more so than in auditory speech. Indeed appearance data improves lipreading classification over shape only models whether one uses Active Appearance Models (AAM) [28] or Discrete Cosine Tranform (DCT) [10] features . 2 In machine lipreading we have interesting evidence: we can both identify individuals from visual speech information [34,35] and, with deep learning and big data, we have the potential to generalise over many speakers [8,36].…”
Section: Speaker-specific Visemesmentioning
confidence: 99%
“…Yes [14][15][16] Yes, [9] Unit choice Yes, [17][18][19][20][21] Yes, [3,4,[22][23][24] Classifier technology Yes, [17,[25][26][27][28] Multiple persons Yes, [29][30][31][32]…”
Section: Video Qualitymentioning
confidence: 99%