Modular neural networks exploit multiple front-ends to improve speech recognition systems

Antoniou, Christos A.; Reynolds, T.J.

doi:10.1109/kes.2000.885793

Cited by 3 publications

(2 citation statements)

References 9 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…One major advantage of the TIMIT task is the availability of the time aligned phonetic transcription which facilitates the AF MLPs training. Furthermore, the TIMIT corpus has been used by many other researchers in the speech processing community [2,4,10] and the performance of our system is comparable to other researchers [5]. While context-dependent phoneme models will give a better performance, our goal here is to demonstrate the usefulness of the fusion and context-independent phoneme model can simplify our experiments significantly.…”

Section: Experiments and Resultsmentioning

confidence: 79%

“…While a single ASR system can perform quite well, many recent works are focusing on the integration of multiple systems [4,7,8,14,16,17] that result in a better performance than using a single system alone. Almost all of these focused on the integration during different stages of the recognition process, such as at frame level [7], state level [4,16,17] or word level [8]. For some system fusions, combination at the state level has been shown to be more effective [14].…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Integration of acoustic and articulatory information with application to speech recognition

Leung

Siu

2004

Information Fusion

View full text Add to dashboard Cite

Section: Experiments and Resultsmentioning

confidence: 79%

Section: Introductionmentioning

confidence: 99%

Integration of acoustic and articulatory information with application to speech recognition

Leung

Siu

2004

Information Fusion

View full text Add to dashboard Cite

Experiments in speech recognition using a modular MLP architecture for acoustic modelling

Reynolds

Antoniou

2003

Information Sciences

View full text Add to dashboard Cite

Modular neural networks exploit large acoustic context through broad-class posteriors for continuous speech recognition

Antoniou

2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221)

View full text Add to dashboard Cite

Traditionally, neural networks such as multi-layer perceptrons handle acoustic context by increasing the dimensionality of the observation vector, in order to include information of the neighbouring acoustic vectors, on either side of the current frame. As a result the monolithic network is trained on a high multi-dimensional space. The trend is to use the same fixed-size observation vector across the one network that estimates the posterior probabilities for all phones, simultaneously. We propose a decomposition of the network into modular components, where each component estimates a phone posterior. The size of the observation vector we use, is not fixed across the modularised networks, but rather accounts for the phone that each network is trained to classify. For each observation vector, we estimate very large acoustic context through broad-class posteriors. The use of the broad-class posteriors along with the phone posteriors greatly enhance acoustic modelling. We report significant improvements in phone classification and word recognition on the TIMIT corpus. Our results are also better than the best context-dependent system in the literature.

show abstract

Modular neural networks exploit multiple front-ends to improve speech recognition systems

Cited by 3 publications

References 9 publications

Integration of acoustic and articulatory information with application to speech recognition

Integration of acoustic and articulatory information with application to speech recognition

Experiments in speech recognition using a modular MLP architecture for acoustic modelling

Modular neural networks exploit large acoustic context through broad-class posteriors for continuous speech recognition

Contact Info

Product

Resources

About