2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221)
DOI: 10.1109/icassp.2001.940878
|View full text |Cite
|
Sign up to set email alerts
|

Modular neural networks exploit large acoustic context through broad-class posteriors for continuous speech recognition

Abstract: Traditionally, neural networks such as multi-layer perceptrons handle acoustic context by increasing the dimensionality of the observation vector, in order to include information of the neighbouring acoustic vectors, on either side of the current frame. As a result the monolithic network is trained on a high multi-dimensional space. The trend is to use the same fixed-size observation vector across the one network that estimates the posterior probabilities for all phones, simultaneously. We propose a decomposit… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 7 publications
(3 citation statements)
references
References 11 publications
0
3
0
Order By: Relevance
“…Existing speech-analysis technology uses, almost exclusively, phoneme-probability scores that are output by a conventional speech recognizer. Given state-of-the-art automatic phoneme recognition accuracy of 76% on speech from non-hearing-impaired adults [4] and increased acoustic variability observed in children's speech [5], it is not surprising that the success of this phoneme-recognition approach has been limited.…”
Section: Introductionmentioning
confidence: 99%
“…Existing speech-analysis technology uses, almost exclusively, phoneme-probability scores that are output by a conventional speech recognizer. Given state-of-the-art automatic phoneme recognition accuracy of 76% on speech from non-hearing-impaired adults [4] and increased acoustic variability observed in children's speech [5], it is not surprising that the success of this phoneme-recognition approach has been limited.…”
Section: Introductionmentioning
confidence: 99%
“…Results of experiments have shown that the specific deep neural network outperformed the single DNN based speech enhancement with an accuracy of 94.1%. Christos Antoniou in [10] has proposed a new design for a broad classification by the modular neural network where the observation vector was not fixed in size. Phones have been divided into seven classes (vowels, plosives, fricatives, nasals, diphthongs, semi-vowels, closures).…”
Section: Literature Reviewmentioning
confidence: 99%
“…Different features such as HATS [6], TRAPS [5] and MRASTA [7] carrying temporal information were shown to be complementary to short term features. The hierarchical or parallel structure MLPs [8,10,11,12,13,14] and MLPs with two or three hidden layers [9,15] were also shown to achieve better performance.…”
Section: Introductionmentioning
confidence: 99%