Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269)
DOI: 10.1109/icip.1998.999008
|View full text |Cite
|
Sign up to set email alerts
|

An image transform approach for HMM based automatic lipreading

Abstract: This paper concentrates o n the visual front end for hidden Markov model based automatic lipreading. Two approaches for extracting features relevant to lipreading, given image sequences of the speaker's mouth region, are considered: A lip contour based feature approach, which first obtains estimates of the speaker's lip contours and subsequently extracts features from them, and an image transf o r m based approach, which obtains a compressed representation of the image pixel values that contain the speaker's m… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

4
112
1

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 132 publications
(117 citation statements)
references
References 11 publications
4
112
1
Order By: Relevance
“…We investigate five different appearance-based descriptors: Principal Component Analysis (PCA) [36], 2D Discrete Cosine Transform (DCT) [36], Discrete Wavelet Transform (DWT) [36], Local Binary Patterns (LBP) [37] and Histograms of Oriented Gradients (HOG) [38], all calculated on pixel intensities. We choose to investigate these appearance descriptors because they have proven to be highly descriptive and informative of facial expression changes by numerous studies on automatic facial expression recognition (e.g., [39]- [41]).…”
Section: Overview Of the Proposed Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…We investigate five different appearance-based descriptors: Principal Component Analysis (PCA) [36], 2D Discrete Cosine Transform (DCT) [36], Discrete Wavelet Transform (DWT) [36], Local Binary Patterns (LBP) [37] and Histograms of Oriented Gradients (HOG) [38], all calculated on pixel intensities. We choose to investigate these appearance descriptors because they have proven to be highly descriptive and informative of facial expression changes by numerous studies on automatic facial expression recognition (e.g., [39]- [41]).…”
Section: Overview Of the Proposed Methodsmentioning
confidence: 99%
“…These appearance features include PCA, DCT, DWT, LBP and HOG. The image-transform-based descriptors, i.e., PCA, DCT and DWT [36], are the most commonly used feature representations for visual speech processing tasks [50], [51]. LBP has been widely used as a robust image compression technique for texture representation [37], and it is one of the most commonly used facial appearance descriptors in face recognition and facial expression recognition [39], [40].…”
Section: ) Mouth Region Of Interest (Roi) Extractionmentioning
confidence: 99%
“…As the application domain is the same, lip reading classification techniques are often the same as those applied in the audio speech recognition (ASR) field and, consequently, dynamic time warping (DTW) [39,40] and HMMs [10,18,41], are popular. Moreover, by using a method common to both the audio and visual aspects of speech, there is the potential for a more straightforward combination of results obtained from separate audio and visual investigations and such integration has often been carried out using machine learning techniques, such as time delay neural network (TDNN) [42], support vector machines (SVM) [43] and AdaBoost [44].…”
Section: Speech Classification Based On Lip Featuresmentioning
confidence: 99%
“…As the area covered by such a mouth region can contain a large number of pixels (for example, assuming each color is represented in 8 bits, a 128 Â 128 pixel region in RGB space will have a total of 49,152 pixels), a transformation to fewer dimensions is needed to make appearance-based approaches computationally manageable. Such transformations are typically borrowed from the image compression and pattern classification literature, such as principal components analysis (PCA) [11], the discrete cosine transform (DCT) [9], the discrete wavelet transform (DWT) [10] and linear discriminant analysis (LDA) [34].…”
Section: Lip Visual Featuresmentioning
confidence: 99%
See 1 more Smart Citation