Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96 1996
DOI: 10.1109/icslp.1996.607024
|View full text |Cite
|
Sign up to set email alerts
|

Speechreading using shape and intensity information

Abstract: We describe a speechreading system that uses both, shape information from the lip contours and intensity information from the mouth area.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
32
0

Year Published

1997
1997
2012
2012

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 52 publications
(32 citation statements)
references
References 10 publications
0
32
0
Order By: Relevance
“…Various feature sets have been tested including DCT coefficients [26], active shape models [27] (ASMs), active appearance models (AAMs) and sieve features [22]. The previous section has described papers in which the recognition performance of some of these features has been studied, and the current finding is that AAM features give the best recognition performance overall [24], despite their poor performance in [6].…”
Section: Active Appearance Modelsmentioning
confidence: 99%
“…Various feature sets have been tested including DCT coefficients [26], active shape models [27] (ASMs), active appearance models (AAMs) and sieve features [22]. The previous section has described papers in which the recognition performance of some of these features has been studied, and the current finding is that AAM features give the best recognition performance overall [24], despite their poor performance in [6].…”
Section: Active Appearance Modelsmentioning
confidence: 99%
“…Within this category belong geometric type features, such as mouth height, width, and area [19], [22], [26], [28], [29], [32][33][34][35], [49][50][51][52], Fourier and image moment descriptors of the lip contours [28], [53], statistical models of shape, such as active shape models [48], [54], or other parameters of lip-tracking models [44], [55][56][57]. Finally, features from both categories can be concatenated into a joint shape and appearance vector [27], [44], [58], [59], or a joint statistical model can be learned on such vectors, as is the case of the active appearance model [60], used for speechreading in [48].…”
Section: The Visual Front Endmentioning
confidence: 99%
“…1). The first scenario is primarily useful in benchmarking the performance of visual feature extraction algorithms, with visual-only ASR results typically reported on small-vocabulary tasks [24], [25], [28][29][30][31], [36], [40][41][42][43], [46], [59], [66], [78], [84][85][86][87][88][89][90][91][92]. Visual speech modeling is required in this process, its two central aspects being the choice of speech classes, that are assumed to generate the observed features, and the statistical modeling of this generation process.…”
Section: Visual Speech Modeling For Asrmentioning
confidence: 99%
See 1 more Smart Citation
“…The visual features are often derived from the shape of the mouth [9] [4], [2]. Although very popular, these methods rely exclusively on the accurate detection of the lip contours which is often a challenging task under varying illumination conditions or rotations of the face.…”
Section: Related Workmentioning
confidence: 99%