Template-Based Continuous Speech Recognition

Wachter, Mathias De; Matton, Mike; Demuynck, Kris; Wambacq, Patrick; Cools, Ronald; Compernolle, Dirk Van

doi:10.1109/tasl.2007.894524

Cited by 129 publications

(91 citation statements)

References 25 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…(De Wachter et al, 2007)) in the scope of this paper, which would immediately enhance the recognition performance at higher SNR levels. In such a setting, the acoustic scores obtained from both streams can be combined to benefit from the noise robustness of exemplar-based acoustic modeling and better discrimination of the statistical models such as complex GMM distributions in conjunction with MFCC features or DNNs.…”

Section: General Discussion and Concluding Remarksmentioning

confidence: 97%

“…Considering the dimensionality and computational restrictions, the same framework using exemplars associated with more general subword units such as phones or syllables could be applied to a medium or large vocabulary task. Only the current decoding scheme would need to be redesigned in a way that it will incorporate a language model combined with the acoustic costs, but for this it could largely rely on existing exemplar matching frameworks (De Wachter et al, 2007).…”

Section: General Discussion and Concluding Remarksmentioning

confidence: 99%

“…Data-driven automatic speech recognition (ASR) techniques (De Wachter et al, 2003;Aradilla et al, 2005;Deselaers et al, 2007;Sundaram and Bellegarda, 2012;Sainath et al, 2012;Heigold et al, 2012;Sun et al, 2014) became popular in the last decade as a viable alternative after the long dominance of statistical acoustic modeling in the form of the Gaussian mixture models (GMM) in hidden Markov models (HMM) (Bourlard et al, 1996). Templates or exemplars are labeled speech segments of multiple lengths extracted from training data, each associated with a certain class, i.e.…”

Section: Introductionmentioning

confidence: 99%

“…Exemplar matching-based recognition can be performed by evaluating the similarity of the exemplars with the segments from the input speech with respect to a distance/divergence metric by applying dynamic time warping (Sakoe and Chiba, 1971;Ney and Ortmanns, 1999;De Wachter et al, 2007). In these applications, speech is represented using discriminatively trained features to ensure that the used distance/divergence metric mostly yields lower scores for the matching class compared to the other classes, resulting in increased recognition accuracies.…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Noise robust exemplar matching with alpha–beta divergence

Yılmaz

Gemmeke

hamme

2016

Speech Communication

View full text Add to dashboard Cite

The noise robust exemplar matching (N-REM) framework performs automatic speech recognition using exemplars, which are the labeled spectrographic representations of speech segments extracted from training data. By incorporating a sparse representations formulation, this technique remedies the inherent noise modeling problem of conventional exemplar matching-based automatic speech recognition systems. In this framework, noisy speech segments are approximated as a sparse linear combination of the exemplars of multiple lengths, each associated with a single speech unit such as words, half-words or phones. On account of the reconstruction error-based back end, the recognition accuracy highly depends on the congruence of the speech features and the divergence metric used to compare the speech segments with exemplars. In this work, we replace the conventional KullbackLeibler divergence (KLD) with a generalized divergence family called the Alpha-Beta divergence with two parameters, α and β, in conjunction with mel-scaled magnitude spectral features. The proposed recognizer traverses the (α,β) plane depending on the amount of contamination to provide better separation of speech and noise sources. Moreover, we apply our recently proposed active noise exemplar selection (ANES) technique in a more realistic scenario where the target utterances are degraded by genuine room noise. Recognition experiments on the small vocabulary track of the 2 nd CHiME Challenge and the AURORA-2 database have shown that the novel recognizer with the AB divergence and ANES outperforms the baseline system using the generalized KLD with tuned sparsity, especially at lower SNR levels.

show abstract

Section: General Discussion and Concluding Remarksmentioning

confidence: 97%

Section: General Discussion and Concluding Remarksmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Noise robust exemplar matching with alpha–beta divergence

Yılmaz

Gemmeke

hamme

2016

Speech Communication

View full text Add to dashboard Cite

show abstract

“…In the history of automatic speech recognition (ASR), DTW first became popular in isolated and connected word recognition and then was supplanted by hidden Markov models (HMMs), a statistical modeling framework appropriate for large vocabulary continuous speech recognition (LVCSR). However, DTW has drawn much interest recently for unsupervised and low-resource tasks, e.g., template-based speech recognition [2,3], unsupervised speech pattern discovery [4,5], example-based spoken term detection (STD) [6,7] and acousticbased spoken document segmentation [8]. Recently, Zhang et.…”

Section: Introductionmentioning

confidence: 99%

A tighter lower bound estimate for dynamic time warping

Yang¹,

Xie²,

Luan³

et al. 2013

2013 IEEE International Conference on Acoustics, Speech and Signal Processing

View full text Add to dashboard Cite

In this paper, we propose a new lower-bound estimate for speeding up dynamic time warping (DTW) on multivariate time sequences. It has several advantages as compared with the inner-product lower bound [1] recently proposed to eliminate a large number of DTW computations. First, we prove that it is tighter than the inner product lower bound while the computational complexity remains comparable. Second, the inner product lower bound is specifically designed for the inner product distance while the proposed lower bound is valid for any distance measure. Third, DTW search can be further speeded up since the distance matrix is calculated in advance at the lower bound estimation stage. Spoken term detection experiments on the TIMIT corpus show that the proposed lower bound estimate is able to reduce the computational requirements for DTW-KNN search by 54% as compared with the inner-product lower bound. in black ink.

show abstract

Ensemble Learning Approaches in Speech Recognition

Zhao

Xue

Chen

2014

Speech and Audio Processing for Coding, Enhancement and Recognition

View full text Add to dashboard Cite

An overview is made on the ensemble learning efforts that have emerged in automatic speech recognition in recent years. The approaches that are based on different machine learning techniques and target various levels and components of speech recognition are described, and their effectiveness is discussed in terms of the direct performance measure of word error rate and the indirect measures of classification margin, diversity, as well as bias and variance. In addition, methods on reducing storage and computation costs of ensemble models for practical deployments of speech recognition systems are discussed. Ensemble learning for speech recognition has been largely fruitful, and it is expected to continue progress along with the advances in machine learning, speech and language modeling, as well as computing technology.

show abstract

Template-Based Continuous Speech Recognition

Cited by 129 publications

References 25 publications

Noise robust exemplar matching with alpha–beta divergence

Noise robust exemplar matching with alpha–beta divergence

A tighter lower bound estimate for dynamic time warping

Ensemble Learning Approaches in Speech Recognition

Contact Info

Product

Resources

About