Asynchronous stream modeling for large vocabulary audio-visual speech recognition

Luettin, Juergen; Potamianos, Gerasimos; Neti, C.

doi:10.1109/icassp.2001.940794

Cited by 60 publications

(46 citation statements)

References 8 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The resulting observation sequences are then modeled using one HMM [12]. A model fusion system based on multi-stream HMM was proposed in [13]. The multi-stream HMM assumes that audio and video sequences are state synchronous but allows the audio and video components to have different contribution to the overall observation likelihood.…”

Section: Related Workmentioning

confidence: 99%

Improved Bimodal Speech Recognition Study Based on Product Hidden Markov Model

Xi¹,

Cho

2013

International Journal of Fuzzy Logic and Intelligent Systems

View full text Add to dashboard Cite

Recent years have been higher demands for automatic speech recognition (ASR) systems that are able to operate robustly in an acoustically noisy environment. This paper proposes an improved product hidden markov model (HMM) used for bimodal speech recognition. A two-dimensional training model is built based on dependently trained audio-HMM and visual-HMM, reflecting the asynchronous characteristics of the audio and video streams. A weight coefficient is introduced to adjust the weight of the video and audio streams automatically according to differences in the noise environment. Experimental results show that compared with other bimodal speech recognition approaches, this approach obtains better speech recognition performance.

show abstract

Section: Related Workmentioning

confidence: 99%

Improved Bimodal Speech Recognition Study Based on Product Hidden Markov Model

Xi¹,

Cho

2013

International Journal of Fuzzy Logic and Intelligent Systems

View full text Add to dashboard Cite

show abstract

“…A related model is the factorial HMM [25], in which there is a single observation sequence, but multiple state sequences that indirectly interact via their common influence on observations. These models have found wide use in automatic speech recognition for multi-stream [4], [39] and audio-visual modeling [36]. multiscale statistical models in the second group have been explored in many different facets of signal processing and data fusion; see [53] for an extensive review.…”

Section: Hmms and Previous Work In Multiscale Modelingmentioning

confidence: 99%

Multirate Coupled Hidden Markov Models and Their Application to Machining Tool-Wear Classification

Çetin

Ostendorf

Bernard

2007

IEEE Trans. Signal Process.

View full text Add to dashboard Cite

Abstract-This paper introduces multi-rate coupled hidden Markov models (multi-rate HMMs for short) for multiscale modeling of nonstationary processes, extending traditional HMMs from single to multiple time scales with hierarchical representations of the process state and observations. Scales in the multi-rate HMMs are organized in a coarse-to-fine manner with Markov conditional independence assumptions within and across scales, allowing for a parsimonious representation of both shortand long-term context and temporal dynamics. Efficient inference and parameter estimation algorithms for the multi-rate HMMs are given, which are similar to the analogous algorithms for HMMs. The model is applied to the classification of tool wear in titanium milling, for which acoustic emissions exhibit multiscale dynamics and long-range dependence. Experimental results show that the multi-rate extension outperforms HMMs in terms of both wear prediction accuracy and confidence estimation.Index Terms-Hidden Markov model, multi-rate hidden Markov model, multiscale statistical modeling, confidence, tool wear, tool-wear monitoring, and milling.

show abstract

“…The LDA and MLLT transform were trained for each noise condition. The video stream features were obtained by an LDA-MLLT transform of the pixels in a region of interest around the mouth as described in [2]. The audio-visual modeling is based on context dependent phone models.…”

Section: Evaluation Tasksmentioning

confidence: 99%

“…Various models of audio-visual integration for speech recognition have been proposed, among which the multistream hidden Markov model (MSHMM) has been demonstrated to consistently improve recognition over audio-only ASR [1,2,3]. This model is based on the use of parallel HMMs to represent various streams of information.…”

Section: Introductionmentioning

confidence: 99%

Maximum entropy and MCE based HMM stream weight estimation for audio-visual ASR

Gravier

Axelrod

Potamianos

et al. 2002

IEEE International Conference on Acoustics Speech and Signal Processing

View full text Add to dashboard Cite

In this paper, we propose a new fast and flexible algorithm based on the maximum entropy (MAXENT) criterion to estimate stream weights in a state-synchronous multi-stream HMM. The technique is compared to the minimum classification error (MCE) criterion and to a brute-force, grid-search optimization of the WER on both a small and a large vocabulary audio-visual continuous speech recognition task. When estimating global stream weights, the MAX-ENT approach gives comparable results to the grid-search and the MCE. Estimation of state dependent weights is also considered: We observe significant improvements in both the MAXENT and MCE criteria, which, however, do not result in significant WER gains.

show abstract

Asynchronous stream modeling for large vocabulary audio-visual speech recognition

Cited by 60 publications

References 8 publications

Improved Bimodal Speech Recognition Study Based on Product Hidden Markov Model

Improved Bimodal Speech Recognition Study Based on Product Hidden Markov Model

Multirate Coupled Hidden Markov Models and Their Application to Machining Tool-Wear Classification

Maximum entropy and MCE based HMM stream weight estimation for audio-visual ASR

Contact Info

Product

Resources

About