Audio-Visual Speech Recognition Using MPEG-4 Compliant Visual Features

Aleksic, Petar S.; Williams, J.J.; Wu, Zhilin; Katsaggelos, Aggelos K.

doi:10.1155/s1110865702206162

Cited by 47 publications

(33 citation statements)

References 28 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…There exist many techniques in the literature that attempt to solve the lip segmentation/tracking problem [12], [29]- [35]. The performance of these techniques usually depend on acquisition specifics such as image quality, resolution, head pose and illumination conditions.…”

Section: Extraction Of Contour-based Motion Features 1) Lip Contoumentioning

confidence: 99%

“…Deformable templates [4], [5], active shape models (ASM) [6], [10], [11], and snakes [12] have been used to obtain different lip geometry features; however, they all suffer from complex feature extraction and training procedures. In [5], Gaussian mixture models (GMM) are used to model both the lip and the non-lip region, and lip tracking is performed by deformable templates.…”

mentioning

confidence: 99%

“…Aleksic et al [12] use gradient vector flow (GVF) snakes to extract outer lip contour and calculate the lip movement at ten predefined points by point-wise coordinate difference. They then reduce the feature dimension by PCA and use lip features together with other facial animation features.…”

mentioning

confidence: 99%

See 2 more Smart Citations

Discriminative Analysis of Lip Motion Features for Speaker Identification and Speech-Reading

Çetingül

Yemez

Erzin³

et al. 2006

IEEE Trans. on Image Process.

101

View full text Add to dashboard Cite

Abstract-There have been several studies that jointly use audio, lip intensity, and lip geometry information for speaker identification and speech-reading applications. This paper proposes using explicit lip motion information, instead of or in addition to lip intensity and/or geometry information, for speaker identification and speech-reading within a unified feature selection and discrimination analysis framework, and addresses two important issues: 1) Is using explicit lip motion information useful, and, 2) if so, what are the best lip motion features for these two applications? The best lip motion features for speaker identification are considered to be those that result in the highest discrimination of individual speakers in a population, whereas for speech-reading, the best features are those providing the highest phoneme/word/phrase recognition rate. Several lip motion feature candidates have been considered including dense motion features within a bounding box about the lip, lip contour motion features, and combination of these with lip shape features. Furthermore, a novel two-stage, spatial, and temporal discrimination analysis is introduced to select the best lip motion features for speaker identification and speech-reading applications. Experimental results using an hidden-Markov-modelbased recognition system indicate that using explicit lip motion information provides additional performance gains in both applications, and lip motion features prove more valuable in the case of speech-reading application.Index Terms-Bayesian discriminative feature selection, lip motion, speaker identification, speech recognition, temporal discriminative feature selection.

show abstract

Section: Extraction Of Contour-based Motion Features 1) Lip Contoumentioning

confidence: 99%

mentioning

confidence: 99%

See 1 more Smart Citation

Discriminative Analysis of Lip Motion Features for Speaker Identification and Speech-Reading

Çetingül

Yemez

Erzin³

et al. 2006

IEEE Trans. on Image Process.

101

View full text Add to dashboard Cite

show abstract

“…Though the discriminant analysis is followed by a smoothing step, the segmentation remains noisy. Snakes [16] have been widely used for lip segmentation ( [1] [20]) because snakes can take into account in a same framework smoothing and elasticity constraints. Snake-based methods yield to interesting results but the main drawback is the tuning of several parameters.…”

Section: Introductionmentioning

confidence: 99%

Parametric models for facial features segmentation

et al. 2006

View full text Add to dashboard Cite

-In this paper, we are dealing with the problem of facial features segmentation (mouth, eyes and eyebrows). A specific parametric model is defined for each deformable feature, each model being able to take into account all the possible deformations. In order to initialize each model, some characteristic points are extracted on each image to be processed (for example, eyes corners, mouth corners and brows corners). In order to fit the model with the contours to be extracted, a gradient flow (of luminance or chrominance) through the estimated contour is maximized because at each point of the searched contour, the gradient (of luminance or chrominance) is normal. The definition of a model associated to each feature offers the possibility to introduce a regularisation constraint. However, the chosen models are flexible enough to produce realistic contours for the mouth, the eyes and the eyebrows. This facial features segmentation is the first step of a set of multi-media applications.

show abstract

“…Significant research has been carried out to accurately obtain the outer lip contour. One of the most popular approaches is using snakes (Kass et al 1988), which have the ability to take smoothing and elasticity constraints into account (Terzopoulos and Waters 1993;Aleksic et al 2002). Another popular approach is using active shape models and appearance shape models.…”

Section: Lip Readingmentioning

confidence: 99%

Gestural Interfaces for Hearing-Impaired Communication

Aran

Bürger

Akarun

et al.

Multimodal User Interfaces

View full text Add to dashboard Cite

Abstract. Gestural interfaces, besides providing natural means of humancomputer interaction for everyone, enable the hearing impaired to use sign language or better understand speech through vision. This chapter overviews (1) the various modalities involved in gestured languages (2) the mean to automatically apprehend them individually and (3) to fuse them in order to provide a communication medium adapted to hearing-impaired. We present two example applications, a sign language tutoring tool and a cued speech interpreter and discuss theoretical and practical aspects.

show abstract

Audio-Visual Speech Recognition Using MPEG-4 Compliant Visual Features

Cited by 47 publications

References 28 publications

Discriminative Analysis of Lip Motion Features for Speaker Identification and Speech-Reading

Discriminative Analysis of Lip Motion Features for Speaker Identification and Speech-Reading

Parametric models for facial features segmentation

Gestural Interfaces for Hearing-Impaired Communication

Contact Info

Product

Resources

About