IEEE International Conference on Acoustics Speech and Signal Processing 2002
DOI: 10.1109/icassp.2002.1006167
|View full text |Cite
|
Sign up to set email alerts
|

A coupled HMM for audio-visual speech recognition

Abstract: In recent years several speech recognition systems that use visual together with audio information showed significant increase in performance over the standard speech recognition systems. The use of visual features is justified by both the bimodality of the speech generation and by the need of features that are invariant to acoustic noise perturbation. The audio-visual speech recognition system presented in this paper introduces a novel audio-visual fusion technique that uses a coupled hidden Markov model (HMM… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
27
0

Year Published

2003
2003
2021
2021

Publication Types

Select...
5
2
2

Relationship

0
9

Authors

Journals

citations
Cited by 57 publications
(27 citation statements)
references
References 4 publications
0
27
0
Order By: Relevance
“…Specifically, we follow the CHMM model used by [10] to carry out behavior pattern recognition. We develop a HMM for each of conference room, hallway and cafeteria, and establish couplings between the three HMMs.…”
Section: Behavior Pattern Recognition Using Coupled Hmmmentioning
confidence: 99%
See 1 more Smart Citation
“…Specifically, we follow the CHMM model used by [10] to carry out behavior pattern recognition. We develop a HMM for each of conference room, hallway and cafeteria, and establish couplings between the three HMMs.…”
Section: Behavior Pattern Recognition Using Coupled Hmmmentioning
confidence: 99%
“…The observation probability for Gaussian mixture components [10] is given by For finding the hidden states, we can use the Viterbi algorithm [10] for the CHMM as below:…”
Section: Behavior Pattern Recognition Using Coupled Hmmmentioning
confidence: 99%
“…Verma et al [17] investigated early and late integration of modalities finding that in this context late integration improves on early integration. Different structures of hidden Markov models (HMM) such as multistream and product HMM [9,10], pairwise and asynchronous HMM [2], and factorial and coupled HMM [12,11] have been investigated taking into account the dominant role of the audio stream. Although those results indicate that the coupled HMM is a good model choice, this finding cannot be transferred directly to sign language recognition because here modalities are more strongly decoupled than in audio-visual speech recognition and there is no "master" modality.…”
Section: Introduction and Related Workmentioning
confidence: 99%
“…Visual speech recognition is an active research topic and plays an essential role in the development of many multimedia systems such as audio-visual speech recognition (AVSR) [4], mobile phone applications and sign language recognition [10]. The inclusion of lip visual features to assist the audio or hand recognition is an opportune option while this information is robust to acoustic noise.…”
Section: Introductionmentioning
confidence: 99%