A hybrid segmental neural net/hidden Markov model system for continuous speech recognition

Zavaliagkos, G.; Zhao, Ying; Schwartz, Richard; Makhoul, John

doi:10.1109/89.260358

Cited by 59 publications

(27 citation statements)

References 26 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…The idea, proposed in [130], is developed in [132], where a connectionist approach is applied to the problem of rescoring the hypothesis generated by an HMM which uses an N-best strategy. In this case the network does not compute scores on individual acoustic frames, but on whole segments (sub-sequencies) of frames, corresponding to phonemes.…”

Section: Other Approachesmentioning

confidence: 99%

“…Di!erent neural architectures were tried (1-layer perceptron, MLP, HyperBF) [132] without signi"cant #uctuations in performance. They were also combined altogether to obtain a more robust rescoring process.…”

Section: Other Approachesmentioning

confidence: 99%

“…Connectionist rescoring [130,132] A segmental neural network computes scores on whole segments of frames, corresponding to phonemes, for re-scoring the segmentation hypothesis yielded by an HMM which uses an N-best strategy. Rescoring of confusable wordsa [78] MLPs are applied after the HMM has generated the N-best word-sequence hypothesis, to`correcta (or con"rm) those individual words that belong to speci"c`confusablea classes.…”

Section: Modelmentioning

confidence: 99%

See 2 more Smart Citations

A survey of hybrid ANN/HMM models for automatic speech recognition

2001

View full text Add to dashboard Cite

In spite of the advances accomplished throughout the last decades, automatic speech recognition (ASR) is still a challenging and di$cult task. In particular, recognition systems based on hidden Markov models (HMMs) are e!ective under many circumstances, but do su!er from some major limitations that limit applicability of ASR technology in real-world environments. Attempts were made to overcome these limitations with the adoption of arti"cial neural networks (ANN) as an alternative paradigm for ASR, but ANN were unsuccessful in dealing with long time-sequences of speech signals. Between the end of the 1980s and the beginning of the 1990s, some researchers began exploring a new research area, by combining HMMs and ANNs within a single, hybrid architecture. The goal in hybrid systems for ASR is to take advantage from the properties of both HMMs and ANNs, improving #exibility and recognition performance. A variety of di!erent architectures and novel training algorithms have been proposed in literature. This paper reviews a number of signi"cant hybrid models for ASR, putting together approaches and techniques from a highly specialistic and non-homogeneous literature. E!orts concentrate on describing and referencing architectures and algorithms, their advantages and limitations, as well as on categorizing them into broad classes. Early attempts to emulate HMMs by ANNs are "rst described. Then we focus on ANNs to estimate posterior probabilities of the states of an HMM and on`globala optimization, where a single, overall training criterion is de"ned over the HMM and the ANNs. Connectionist vector quantization for discrete HMMs, and other more recent approaches are also reviewed. It is pointed out that, in addition to their theoretical interest, hybrid systems have been allowing for tangible improvements in recognition performance over the standard HMMs in di$cult and signi"cant benchmark tasks.

show abstract

Section: Other Approachesmentioning

confidence: 99%

Section: Other Approachesmentioning

confidence: 99%

Section: Modelmentioning

confidence: 99%

See 1 more Smart Citation

A survey of hybrid ANN/HMM models for automatic speech recognition

2001

View full text Add to dashboard Cite

show abstract

“…The most popular solution for avoiding the problems associated with P (S|A) is to run a frame-based (e.g. HMM) recognizer, and re-score only the N best paths by the segmental phoneme models [11]. We, however, wanted to model P (S|A) with discriminative classifiers, for which we trained segmental probabilities P (s i |A i ).…”

Section: Segment-based Recognitionmentioning

confidence: 99%

A Discriminative Segmental Speech Model and Its Application to Hungarian Number Recognition

Tóth

Kocsor

Kovács

2000

Text, Speech and Dialogue

View full text Add to dashboard Cite

Abstract. This paper presents a stochastic segmental speech recognizer that models the a posteriori probabilities directly. The main issues concerning the system are segmental phoneme classification, utterance-level aggregation and the pruning of the search space. For phoneme classification artificial neural networks and support vector machines are applied. Phonemic segmentation and utterance-level aggregation is performed with the aid of anti-phoneme modeling. At the phoneme level the system convincingly outperforms the HMM system trained on the same corpus, while at the word level it attains the performance of the HMM system trained without embedded training.

show abstract

“…MLP-based posteriors have also been used to re-score hypothesis in continuous speech recognition [4].…”

Section: Introductionmentioning

confidence: 99%

Posterior-based confidence measures for spoken term detection

Wang

Tejedor

Frankel

et al. 2009

2009 IEEE International Conference on Acoustics, Speech and Signal Processing

View full text Add to dashboard Cite

Esta es la versión de autor de la comunicación de congreso publicada en: This is an author produced version of a paper published in: ABSTRACTConfidence measures play a key role in spoken term detection (STD) tasks. The confidence measure expresses the posterior probability of the search term appearing in the detection period, given the speech. Traditional approaches are based on the acoustic and language model scores for candidate detections found using automatic speech recognition, with Bayes' rule being used to compute the desired posterior probability.In this paper, we present a novel direct posterior-based confidence measure which, instead of resorting to the Bayesian formula, calculates posterior probabilities from a multi-layer perceptron (MLP) directly. Compared with traditional Bayesian-based methods, the direct-posterior approach is conceptually and mathematically simpler. Moreover, the MLP-based model does not require assumptions to be made about the acoustic features such as their statistical distribution and the independence of static and dynamic co-efficients. Our experimental results in both English and Spanish demonstrate that the proposed direct posterior-based confidence improves STD performance.

show abstract

A hybrid segmental neural net/hidden Markov model system for continuous speech recognition

Cited by 59 publications

References 26 publications

A survey of hybrid ANN/HMM models for automatic speech recognition

A survey of hybrid ANN/HMM models for automatic speech recognition

A Discriminative Segmental Speech Model and Its Application to Hungarian Number Recognition

Posterior-based confidence measures for spoken term detection

Contact Info

Product

Resources

About