A coupled HMM for audio-visual speech recognition

Nefian,; Liang, Luhong; Pi, Xiaobo; Liu, Xiaoxiang; Mao,; Murphy, .

doi:10.1109/icassp.2002.1006167

Cited by 57 publications

(27 citation statements)

References 4 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Specifically, we follow the CHMM model used by [10] to carry out behavior pattern recognition. We develop a HMM for each of conference room, hallway and cafeteria, and establish couplings between the three HMMs.…”

Section: Behavior Pattern Recognition Using Coupled Hmmmentioning

confidence: 99%

See 1 more Smart Citation

Activity-informed Dynamic Data Driven Simulation

Rai

Wang

2013

ITM Web of Conferences

View full text Add to dashboard Cite

Abstract. In previous work we developed dynamic data driven simulation (DDDS) that assimilates real time sensor data using Sequential Monte Carlo (SMC) methods. This paper builds on previous work and presents a framework that adds a real time behavior pattern detection layer on top of data assimilation for dynamic data driven simulation. The real time behavior pattern detection layer uses Hidden Markov Model (HMM) to detect the behavior patterns of a system in real time and uses the detected behavior pattern to inform the simulation model for more accurate simulation. We apply the proposed framework to a smart environment application and discuss how to recognize behavior pattern from spatial-temporal sensor data using Coupled HMM (CHMM).

show abstract

Section: Behavior Pattern Recognition Using Coupled Hmmmentioning

confidence: 99%

“…The observation probability for Gaussian mixture components [10] is given by For finding the hidden states, we can use the Viterbi algorithm [10] for the CHMM as below:…”

Section: Behavior Pattern Recognition Using Coupled Hmmmentioning

confidence: 99%

Activity-informed Dynamic Data Driven Simulation

Rai

Wang

2013

ITM Web of Conferences

View full text Add to dashboard Cite

show abstract

“…Verma et al [17] investigated early and late integration of modalities finding that in this context late integration improves on early integration. Different structures of hidden Markov models (HMM) such as multistream and product HMM [9,10], pairwise and asynchronous HMM [2], and factorial and coupled HMM [12,11] have been investigated taking into account the dominant role of the audio stream. Although those results indicate that the coupled HMM is a good model choice, this finding cannot be transferred directly to sign language recognition because here modalities are more strongly decoupled than in audio-visual speech recognition and there is no "master" modality.…”

Section: Introduction and Related Workmentioning

confidence: 99%

Modality Combination Techniques for Continuous Sign Language Recognition

Förster

Oberdörfer

Koller

et al. 2013

Pattern Recognition and Image Analysis

View full text Add to dashboard Cite

Abstract. Sign languages comprise parallel aspects and use several modalities to form a sign but so far it is not clear how to best combine these modalities in the context of statistical sign language recognition. We investigate early combination of features, late fusion of decisions, as well as synchronous combination on the hidden Markov model state level, and asynchronous combination on the gloss level. This is done for five modalities on two publicly available benchmark databases consisting of challenging real-life data and less complex lab-data, the state-of-the-art typically focusses on. Using modality combination, the best published word error rate on the SIGNUM database (lab-data) is improved from 11.9% to 10.7% and from 55% to 41.9% on the RWTH-PHOENIX-Weather database (challenging real-life data).

show abstract

“…Visual speech recognition is an active research topic and plays an essential role in the development of many multimedia systems such as audio-visual speech recognition (AVSR) [4], mobile phone applications and sign language recognition [10]. The inclusion of lip visual features to assist the audio or hand recognition is an opportune option while this information is robust to acoustic noise.…”

Section: Introductionmentioning

confidence: 99%

A PCA based manifold representation for visual speech recognition

Yu¹,

Ghita²,

Sutherland³

et al. 2007

China-Ireland International Conference on Information and Communications Technologies (CIICT 2007)

View full text Add to dashboard Cite

In this paper, we discuss a new Principal Component Analysis (PCA)-based manifold representation for visual speech recognition. In this regard, the real time input video data is compressed using Principal Component Analysis and the low-dimensional points calculated for each frame define the manifold. Since the number of frames that form the video sequence is dependent on the word complexity, in order to use these manifolds for visual speech classification it is required to re-sample them into a fixed pre-defined number of key-points. These key-points are used as input for a Hidden Markov Model (HMM) classification scheme. We have applied the developed visual speech recognition system to a database containing a group of English words and the experimental data indicates that the proposed approach is able to produce accurate classification results.

show abstract

A coupled HMM for audio-visual speech recognition

Cited by 57 publications

References 4 publications

Activity-informed Dynamic Data Driven Simulation

Activity-informed Dynamic Data Driven Simulation

Modality Combination Techniques for Continuous Sign Language Recognition

A PCA based manifold representation for visual speech recognition

Contact Info

Product

Resources

About