An application of recurrent nets to phone probability estimation

Robinson, Anthony J.

doi:10.1109/72.279192

Cited by 367 publications

(168 citation statements)

References 27 publications

Supporting

Mentioning

160

Contrasting

Unclassified

Order By: Relevance

“…Hybrids of hidden Markov models (HMMs) and artificial neural networks (ANNs) were proposed by several researchers in the 1990s as a way of overcoming the drawbacks of HMMs (Bourlard and Morgan, 1994;Bengio, 1993;Robinson, 1994;Bengio, 1999). The introduction of ANNs was intended to provide more discriminant training, improved modelling of phoneme duration, richer, nonlinear function approximation, and perhaps most importantly, increased use of contextual information.…”

Section: Introductionmentioning

confidence: 99%

Supervised Sequence Labelling with Recurrent Neural Networks

Graves

2012

Studies in Computational Intelligence

2,326

2,054

View full text Add to dashboard Cite

Recurrent neural networks are powerful sequence learners. They are able to incorporate context information in a flexible way, and are robust to localised distortions of the input data. These properties make them well suited to sequence labelling, where input sequences are transcribed with streams of labels. Long short-term memory is an especially promising recurrent architecture, able to bridge long time delays between relevant input and output events, and thereby access long range context. The aim of this thesis is to advance the state-of-the-art in supervised sequence labelling with recurrent networks in general, and long short-term memory in particular. Its two main contributions are (1) a new type of output layer that allows recurrent networks to be trained directly for sequence labelling tasks where the alignment between the inputs and the labels is unknown, and (2) an extension of long short-term memory to multidimensional data, such as images and video sequences. Experimental results are presented on speech recognition, online and offline handwriting recognition, keyword spotting, image segmentation and image classification, demonstrating the advantages of advanced recurrent networks over other sequential algorithms, such as hidden Markov Models.ii

show abstract

Section: Introductionmentioning

confidence: 99%

Supervised Sequence Labelling with Recurrent Neural Networks

Graves

2012

Studies in Computational Intelligence

2,326

2,054

View full text Add to dashboard Cite

show abstract

“…This system was tested with the Wall Street Journal database. The TIMIT results came from a hybrid RNN/HMM in 1994, (Robinson, 1994). The inputs to the neural network are features extracted using a long left context.…”

Section: Overview Of Current and Past Research On Timit Phone Recognimentioning

confidence: 99%

Phoneme Recognition on the TIMIT Database

Lopes¹,

Perdigão²

2011

Speech Technologies

View full text Add to dashboard Cite

“…Spoken utterances are represented as arrays of phoneme probabilities. A recurrent neural network similar to [24] processes RASTA-PLP coefficients [25] to estimate phoneme and speech/silence probabilities. The RNN has 12 input units, 176 hidden units, and 40 output units.…”

Section: Representing and Comparing Spoken Utterancesmentioning

confidence: 99%

Grounded spoken language acquisition: experiments in word learning

Roy

2003

IEEE Trans. Multimedia

View full text Add to dashboard Cite

Abstract-Language is grounded in sensory-motor experience. Grounding connects concepts to the physical world enabling humans to acquire and use words and sentences in context. Currently most machines which process language are not grounded. Instead, semantic representations are abstract, pre-specified, and have meaning only when interpreted by humans. We are interested in developing computational systems which represent words, utterances, and underlying concepts in terms of sensory-motor experiences leading to richer levels of machine understanding. A key element of this work is the development of effective architectures for processing multisensory data. Inspired by theories of infant cognition, we present a computational model which learns words from untranscribed acoustic and video input. Channels of input derived from different sensors are integrated in an information-theoretic framework. Acquired words are represented in terms of associations between acoustic and visual sensory experience. The model has been implemented in a real-time robotic system which performs interactive language learning and understanding. Successful learning has also been demonstrated using infant-directed speech and images.

show abstract

An application of recurrent nets to phone probability estimation

Cited by 367 publications

References 27 publications

Supervised Sequence Labelling with Recurrent Neural Networks

Supervised Sequence Labelling with Recurrent Neural Networks

Phoneme Recognition on the TIMIT Database

Grounded spoken language acquisition: experiments in word learning

Contact Info

Product

Resources

About