Modular neural networks exploit large acoustic context through broad-class posteriors for continuous speech recognition

Antoniou, Christos A.

doi:10.1109/icassp.2001.940878

Cited by 7 publications

(3 citation statements)

References 11 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Existing speech-analysis technology uses, almost exclusively, phoneme-probability scores that are output by a conventional speech recognizer. Given state-of-the-art automatic phoneme recognition accuracy of 76% on speech from non-hearing-impaired adults [4] and increased acoustic variability observed in children's speech [5], it is not surprising that the success of this phoneme-recognition approach has been limited.…”

Section: Introductionmentioning

confidence: 99%

Pronunciation analysis for children with speech sound disorders

Dudy

Asgari

Kain

2015

2015 37th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC)

View full text Add to dashboard Cite

Phonological disorders affect 10% of preschool and school-age children, adversely affecting their communication, academic performance, and interaction level. Effective pronunciation training requires prolonged supervised practice and interaction. Unfortunately, many children do not have access or only limited access to a speech-language pathologist. Computer-assisted pronunciation training has the potential for being a highly effective teaching aid; however, to-date such systems remain incapable of identifying pronunciation errors with sufficient accuracy. In this paper, we propose to improve accuracy by (1) learning acoustic models from a large children's speech database, (2) using an explicit model of typical pronunciation errors of children in the target age range, and (3) explicit modeling of the acoustics of distorted phonemes.

show abstract

Section: Introductionmentioning

confidence: 99%

Pronunciation analysis for children with speech sound disorders

Dudy

Asgari

Kain

2015

2015 37th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC)

View full text Add to dashboard Cite

show abstract

“…Results of experiments have shown that the specific deep neural network outperformed the single DNN based speech enhancement with an accuracy of 94.1%. Christos Antoniou in [10] has proposed a new design for a broad classification by the modular neural network where the observation vector was not fixed in size. Phones have been divided into seven classes (vowels, plosives, fricatives, nasals, diphthongs, semi-vowels, closures).…”

Section: Literature Reviewmentioning

confidence: 99%

Comparative Study of different types of RNN in Speech Classification

Ragheb¹,

Gody

Said

2021

The Egyptian Journal of Language Engineering

View full text Add to dashboard Cite

This paper introduces different pre-processing classification models and their performance in the Automatic Speech Recognition system. Other Recurrent Neural Network (RNN) architectures have been tested for this problem, such as RNN cells (RNN), bidirectional RNN (BRNN), Long Short-Term Memory (LSTM), and bidirectional LSTM. Mainly, two features have been considered. First, Mel Frequency Cepstral Coefficient (MFCC) plus delta and delta-delta coefficients (39 parameters) have been used. Second, MFCC quantization using Vector Quantization technique has been used as features. All models have been trained on TIMIT database. Vowels, nasals, fricatives, plosives, and silences have been chosen as syllable classes for classification. Experiment results show that BRNN-MFCC-5-{30,30,20,25,25} system give the highest accuracy. It achieved 92.6%. In similar work of using RNN in classification, 83% accuracy was achieved by [1], and 95% had been achieved by [2]. It is also noticeable that the results obtained by using HMM in a similar problem are 80% by[19] and 81.01% by [17].

show abstract

“…Different features such as HATS [6], TRAPS [5] and MRASTA [7] carrying temporal information were shown to be complementary to short term features. The hierarchical or parallel structure MLPs [8,10,11,12,13,14] and MLPs with two or three hidden layers [9,15] were also shown to achieve better performance.…”

Section: Introductionmentioning

confidence: 99%

Improved clustered hierarchical tandem system with bottom-up processing

Chang

Lee

2009

2009 IEEE International Conference on Acoustics, Speech and Signal Processing

View full text Add to dashboard Cite

The outputs of multi-layer perceptron (MLP) classifiers have been successfully used in tandem systems as features for HMM-based automatic speech recognition. In a previous paper, we proposed Data-driven Clustered Hierarchical MLP (CHMLP) tandem system yielding improved performance by dividing the complicated global phone classification problem into simpler hierarchical tasks, in which specialized MLPs are trained to classify small clusters of confusing phones in a hierarchical structure. In this paper a bottom-up processing is further proposed to enhance the classification in the above CHMLP and offer even better performance. MLP rescoring for the tandem system is also investigated. The best result achieved 19.1% relative error reduction over the MFCC baseline.

show abstract

Modular neural networks exploit large acoustic context through broad-class posteriors for continuous speech recognition

Cited by 7 publications

References 11 publications

Pronunciation analysis for children with speech sound disorders

Pronunciation analysis for children with speech sound disorders

Comparative Study of different types of RNN in Speech Classification

Improved clustered hierarchical tandem system with bottom-up processing

Contact Info

Product

Resources

About