Explicit time correlation in hidden Markov models for speech recognition

Wellekens, C.

doi:10.1109/icassp.1987.1169614

Cited by 101 publications

(58 citation statements)

References 9 publications

(10 reference statements)

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…In this paper we propose using the autoregressive HMM [4]- [7] for speech synthesis. The autoregressive HMM relaxes the traditional HMM conditional independence assumption, allowing state output distributions which depend on past output as well as the current state.…”

mentioning

confidence: 99%

“…Autoregressive HMMs have been used before for speech recognition [4]- [6], [8], but have not been extensively investigated for speech synthesis. 1 A basic formulation of the autoregressive HMM for statistical parametric speech synthesis showing how to do expectation maximization-based parameter estimation and parameter generation considering global variance was given in [11].…”

mentioning

confidence: 99%

See 1 more Smart Citation

Autoregressive Models for Statistical Parametric Speech Synthesis

Shannon

Zen

Byrne

2013

IEEE Trans. Audio Speech Lang. Process.

View full text Add to dashboard Cite

Abstract-We propose using the autoregressive hidden Markov model (HMM) for speech synthesis. The autoregressive HMM uses the same model for parameter estimation and synthesis in a consistent way, in contrast to the standard approach to statistical parametric speech synthesis. It supports easy and efficient parameter estimation using expectation maximization, in contrast to the trajectory HMM. At the same time its similarities to the standard approach allow use of established high quality synthesis algorithms such as speech parameter generation considering global variance. The autoregressive HMM also supports a speech parameter generation algorithm not available for the standard approach or the trajectory HMM and which has particular advantages in the domain of real-time, low latency synthesis. We show how to do efficient parameter estimation and synthesis with the autoregressive HMM and look at some of the similarities and differences between the standard approach, the trajectory HMM and the autoregressive HMM. We compare the three approaches in subjective and objective evaluations. We also systematically investigate which choices of parameters such as autoregressive order and number of states are optimal for the autoregressive HMM.

show abstract

mentioning

confidence: 99%

mentioning

confidence: 99%

Autoregressive Models for Statistical Parametric Speech Synthesis

Shannon

Zen

Byrne

2013

IEEE Trans. Audio Speech Lang. Process.

View full text Add to dashboard Cite

show abstract

“…This DBN describes HMMs with explicit temporal correlation modelling [181], vector predictors [184], and buried Markov models [16]. Although an interesting direction for refining an HMM, this approach has not yet been adopted in mainstream state-of-the-art systems.…”

Section: Dynamic Bayesian Networkmentioning

confidence: 99%

The Application of Hidden Markov Models in Speech Recognition

Gales

Young

2007

FNT in Signal Processing

427

155

View full text Add to dashboard Cite

Hidden Markov Models (HMMs) provide a simple and effective framework for modelling time-varying spectral vector sequences. As a consequence, almost all present day large vocabulary continuous speech recognition (LVCSR) systems are based on HMMs.Whereas the basic principles underlying HMM-based LVCSR are rather straightforward, the approximations and simplifying assumptions involved in a direct implementation of these principles would result in a system which has poor accuracy and unacceptable sensitivity to changes in operating environment. Thus, the practical application of HMMs in modern systems involves considerable sophistication.The aim of this review is first to present the core architecture of a HMM-based LVCSR system and then describe the various refinements which are needed to achieve state-of-the-art performance. These refinements include feature projection, improved covariance modelling, discriminative parameter estimation, adaptation and normalisation, noise compensation and multi-pass system combination. The review concludes with a case study of LVCSR for Broadcast News and Conversation transcription in order to illustrate the techniques described.

show abstract

“…The trace of this moving point is called the trajectory of the symbol. Several techniques were applied to model these trajectories [22,23,24,25]. …”

Section: 1mentioning

confidence: 99%

A statistical model for an automatic procedure to compress a word transcription dictionary

Mouria-Beji¹

1998

Advances in Pattern Recognition

View full text Add to dashboard Cite

Abstractvarious experiments have conclusively shown that superior continuous speech recognition performance is obtained when using context-dependent phonemic models. However, we have observed that using an explicit context-dependent phonemic model can yield many transcriptions for a single lexicon entry. In this work, we study the compression of the word transcription dictionaries (WTD) into a more compact form to balance the need between flexibility and reliability. Based on a measure of a likelihood function, a statistical model for an automatic procedure to compress a WTD is developed. The compressed dictionary is then used for sentence recognition in a continuous speech recognition system. Experimental results indicate a substantial improvement of the recognition rate after compression. IntroductionSeveral continuous speech recognition systems currently being developed use context-dependent models which seek to capture the pronunciation variations resulting from phonetic context effects. It has been observed that with these models, large vocabulary speech recognition systems usually necessitates to compress, from some standard references, a word phonetic transcription dictionary (WTD) [i, 2, 3, 4]. Such a dictionary generally gives a single transcription for a lexicon entry. Continuous speech recognition systems have shown satisfactory results when using dictionaries prepared in this way [3,5,6,7]. However, because of the large variations in the pronouneiation of a given word [8, 9, 10, 11, 12, 13, ], it is very difficult, when compressing a dictionary, to capture its most representative variant. Word transcriptions are first obtained from the context-dependent phonemic model CODEPHON-STM based on tile automatically expending speed and context (AESC) approach developed in our laboratory [11,4], and then compressed into a more compact form to balance the need between flexibility and reliability using an automatic procedure. In fact, the direct word transcriptions given by CODEPHON-STM can yield many transcriptions for a single lexicon entry.

show abstract

Explicit time correlation in hidden Markov models for speech recognition

Cited by 101 publications

References 9 publications

Autoregressive Models for Statistical Parametric Speech Synthesis

Autoregressive Models for Statistical Parametric Speech Synthesis

The Application of Hidden Markov Models in Speech Recognition

A statistical model for an automatic procedure to compress a word transcription dictionary

Contact Info

Product

Resources

About