Hierarchical discriminant features for audio-visual LVCSR

Potamianos, Gerasimos; Luettin, Juergen; Neti, C.

doi:10.1109/icassp.2001.940793

Cited by 56 publications

(53 citation statements)

References 9 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…As such, visual speech recognition has become the focus of numerous research projects. Work includes investigations into modelling techniques and feature stream combination [2][3] [4][5] feature extraction [6] [7][8] and more recently head pose invariant lip-reading [9] [10]. One of the factors limiting the recognition accuracy of visual only speech recognition is the small number of possible lip shapes/movements in relation to the range of corresponding vocal sounds.…”

Section: Introductionmentioning

confidence: 99%

Inter-frame contextual modelling for visual speech recognition

Pass

Hanna

et al. 2010

2010 IEEE International Conference on Image Processing

View full text Add to dashboard Cite

In this paper, we present a new approach to visual speech recognition which improves contextual modelling by combining InterFrame Dependent and Hidden Markov Models. This approach captures contextual information in visual speech that may be lost using a Hidden Markov Model alone. We apply contextual modelling to a large speaker independent isolated digit recognition task, and compare our approach to two commonly adopted feature based techniques for incorporating speech dynamics. Results are presented from baseline feature based systems and the combined modelling technique. We illustrate that both of these techniques achieve similar levels of performance when used independently. However significant improvements in performance can be achieved through a combination of the two. In particular we report an improvement in excess of 17% relative Word Error Rate in comparison to our best baseline system.

show abstract

Section: Introductionmentioning

confidence: 99%

Inter-frame contextual modelling for visual speech recognition

Pass

Hanna

et al. 2010

2010 IEEE International Conference on Image Processing

View full text Add to dashboard Cite

show abstract

“…In the first case, the audio and visual features are combined projecting them onto an audio-visual feature space, where traditional single-stream classifiers are used [33][34][35][36]. Decision fusion, on its turn, processes the streams separately and, at a certain level, combines the outputs of each singlemodality classifier.…”

Section: Audio-visual Integration and Classificationmentioning

confidence: 99%

Multi-pose lipreading and audio-visual speech recognition

Estellers

Thiran

2012

EURASIP J. Adv. Signal Process.

View full text Add to dashboard Cite

In this article, we study the adaptation of visual and audio-visual speech recognition systems to non-ideal visual conditions. We focus on overcoming the effects of a changing pose of the speaker, a problem encountered in natural situations where the speaker moves freely and does not keep a frontal pose with relation to the camera. To handle these situations, we introduce a pose normalization block in a standard system and generate virtual frontal views from non-frontal images. The proposed method is inspired by pose-invariant face recognition and relies on linear regression to find an approximate mapping between images from different poses. We integrate the proposed pose normalization block at different stages of the speech recognition system and quantify the loss of performance related to pose changes and pose normalization techniques. In audio-visual experiments we also analyze the integration of the audio and visual streams. We show that an audio-visual system should account for non-frontal poses and normalization techniques in terms of the weight assigned to the visual stream in the classifier.

show abstract

“…Visual speech information can play a vital role for the improvement of natural and robust human-computer interaction [3,4]. Most published works in the areas of speech recognition and speaker recognition focus on speech under the noiseless environments and few published works focus on speech under noisy conditions [5,6].…”

Section: Introductionmentioning

confidence: 99%

Hybrid Feature and Decision Fusion Based Audio-Visual Speaker Identification in Challenging Environment

Islam¹,

Rahman²

2010

IJCA

View full text Add to dashboard Cite

The contribution of this paper is to propose a novel approach of evaluating the performance of a noise robust audio-visual speaker identification system in challenging environment. Though the traditional HMM based audio-visual speaker identification system is very sensitive to the speech parameter variation, the proposed hybrid feature and decision fusion based audio-visual speaker identification is found to be stance and performs well for improving the robustness and naturalness of human-computerinteraction. Linear Prediction Cepstral Coefficients and Mel Frequency Cepstral Coefficients are used to extract the audio features and Active Appearance Model and Active Shape Model have been used to extract the appearance and shape based features for the facial image. Principal Component Analysis method has been used to reduce the dimensionality of large feature vector and to normalize, the vector normalization algorithm has been used. Features and decision both are fused in two different levels and finally four different classifier outputs are combined in parallel fashion to achieve the identification result. The performances of all these uni-modal and multi-modal system performance have been evaluated and compared with each other on VALID audiovisual multi-modal database, containing both vocal and visual biometric modalities.

show abstract

Hierarchical discriminant features for audio-visual LVCSR

Cited by 56 publications

References 9 publications

Inter-frame contextual modelling for visual speech recognition

Inter-frame contextual modelling for visual speech recognition

Multi-pose lipreading and audio-visual speech recognition

Hybrid Feature and Decision Fusion Based Audio-Visual Speaker Identification in Challenging Environment

Contact Info

Product

Resources

About