Nonlinear discriminant analysis for improved speech recognition

Fontaine, Vincent; Ris, Christophe; Boite, Jean-Marc

doi:10.21437/eurospeech.1997-548

Cited by 24 publications

(3 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…One of the main benefits of the explicit alignment approaches such as CTC, RNN-T, or RNA is that they result in ASR models that are easily amenable to frame-synchronous decoding 6 In this section, we discuss the attention-based encoder-decoder (AED) models (also known as, listen-attendand-spell (LAS)) [15], [16], [53], which employs the attention mechanism [43] to implicitly identify and model the portions of the input acoustics which are relevant to each output unit. These models were first popularized in the context of machine translation [54].…”

Section: Implicit Alignment Modeling Approachesmentioning

confidence: 99%

“…Within the classical approach, deep learning has been introduced into acoustic and language modeling. In acoustic modeling, deep learning has replaced Gaussian mixture distributions (hybrid HMM [4], [5]) or augmented the acoustic feature set (e.g., non-linear discriminant/tandem approach [6], [7]). In language modeling, deep learning has replaced count-based approaches [8], [9], [10].…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

End-to-End Speech Recognition: A Survey

Prabhavalkar,

Hori,

Sainath

et al. 2024

IEEE/ACM Trans. Audio Speech Lang. Process.

View full text Add to dashboard Cite

In the last decade of automatic speech recognition (ASR) research, the introduction of deep learning has brought considerable reductions in word error rate of more than 50% relative, compared to modeling without deep learning. In the wake of this transition, a number of all-neural ASR architectures have been introduced. These so-called end-to-end (E2E) models provide highly integrated, completely neural ASR models, which rely strongly on general machine learning knowledge, learn more consistently from data, with lower dependence on ASR domainspecific experience. The success and enthusiastic adoption of deep learning, accompanied by more generic model architectures has led to E2E models now becoming the prominent ASR approach. The goal of this survey is to provide a taxonomy of E2E ASR models and corresponding improvements, and to discuss their properties and their relationship to classical hidden Markov model (HMM) based ASR architectures. All relevant aspects of E2E ASR are covered in this work: modeling, training, decoding, and external language model integration, discussions of performance and deployment opportunities, as well as an outlook into potential future developments.

show abstract

Section: Implicit Alignment Modeling Approachesmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

End-to-End Speech Recognition: A Survey

Prabhavalkar,

Hori,

Sainath

et al. 2024

IEEE/ACM Trans. Audio Speech Lang. Process.

View full text Add to dashboard Cite

show abstract

“…Within the classical approach, deep learning has been introduced to acoustic and language modeling. In acoustic modeling, deep learning replaced Gaussian mixture distributions (hybrid HMM [3], [4]) or augmented the acoustic feature set (nonlinear disciminant/tandem approach [5], [6]). In language modeling, deep learning replaced count-based approaches [7], [8], [9].…”

Section: Introductionmentioning

confidence: 99%

End-to-End Speech Recognition: A Survey

Prabhavalkar¹,

Hori²,

Sainath³

et al. 2023

Preprint

View full text Add to dashboard Cite

In the last decade of automatic speech recognition (ASR) research, the introduction of deep learning brought considerable reductions in word error rate of more than 50% relative, compared to modeling without deep learning. In the wake of this transition, a number of all-neural ASR architectures were introduced. These so-called end-to-end (E2E) models provide highly integrated, completely neural ASR models, which rely strongly on general machine learning knowledge, learn more consistently from data, while depending less on ASR domainspecific experience. The success and enthusiastic adoption of deep learning accompanied by more generic model architectures lead to E2E models now becoming the prominent ASR approach. The goal of this survey is to provide a taxonomy of E2E ASR models and corresponding improvements, and to discuss their properties and their relation to the classical hidden Markov model (HMM) based ASR architecture. All relevant aspects of E2E ASR are covered in this work: modeling, training, decoding, and external language model integration, accompanied by discussions of performance and deployment opportunities, as well as an outlook into potential future developments.

show abstract

MLP Internal Representation as Discriminative Features for Improved Speaker Recognition

Morris

Koreman

2006

Nonlinear Analyses and Algorithms for Speech Processing

View full text Add to dashboard Cite

Nonlinear discriminant analysis for improved speech recognition

Cited by 24 publications

References 0 publications

End-to-End Speech Recognition: A Survey

End-to-End Speech Recognition: A Survey

End-to-End Speech Recognition: A Survey

MLP Internal Representation as Discriminative Features for Improved Speaker Recognition

Contact Info

Product

Resources

About