2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2015
DOI: 10.1109/icassp.2015.7179005
|View full text |Cite
|
Sign up to set email alerts
|

Recurrent neural network language model training with noise contrastive estimation for speech recognition

Abstract: In recent years recurrent neural network language models (RNNLMs) have been successfully applied to a range of tasks including speech recognition. However, an important issue that limits the quantity of data used, and their possible application areas, is the computational cost in training. A significant part of this cost is associated with the softmax function at the output layer, as this requires a normalization term to be explicitly calculated. This impacts both the training and testing speed, especially whe… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

2
89
0
1

Year Published

2016
2016
2022
2022

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 79 publications
(92 citation statements)
references
References 16 publications
(33 reference statements)
2
89
0
1
Order By: Relevance
“…Noise contrastive estimation (NCE) is another sampling-based technique (Hyvärinen, 2010;Mnih and Teh, 2012;Chen et al, 2015). Contrary to target sampling, it does not maximize the training data likelihood directly.…”
Section: Noise Contrastive Estimationmentioning
confidence: 99%
“…Noise contrastive estimation (NCE) is another sampling-based technique (Hyvärinen, 2010;Mnih and Teh, 2012;Chen et al, 2015). Contrary to target sampling, it does not maximize the training data likelihood directly.…”
Section: Noise Contrastive Estimationmentioning
confidence: 99%
“…Therefore, our decision to implement all methods in a shared codebase, which ensured a fair comparison of model quality, also prevented us from providing a meaningful evaluation of training speed, as the code and architecture were implicitly optimized to favour the most demanding method (MLE). Fortunately, there is ample evidence that NCE can provide large improvements to per-batch training speeds for NNLMs, ranging from a 2× speed-up for 20K-word vocabularies on a GPU (Chen et al, 2015) to more than 10× for 70K-word vocabularies on a CPU (Vaswani et al, 2013). Meanwhile, our experiments show that 1.2M batches are sufficient for MLE, NCE-T and NCE-M to achieve very high quality; that is, none of these methods made use of early stopping during their main training pass.…”
Section: Impact On Speedmentioning
confidence: 99%
“…n-gram LMs dominated ASR for decades until RNNLMs [1] were introduced and found to give significant gains in performance. n-gram LM and RNNLM contributions are complementary and state-of-the-art ASR systems involve interpolation between the two types of models [1,2,3,4,5,6,7].…”
Section: Introductionmentioning
confidence: 99%
“…RNNLMs trained on a text corpus provide an implicit modelling of such contextual factors. It has been found that feature-based adaptation of RNNLMs by augmenting the input with domain-specific auxiliary features provide significant improvements in both perplexity (PPL) and word error rate (WER) [8,2,9,10,4,6,11]. Such features, however, can also include acoustic embeddings [12,13] derived from audio, which might be available for only a subset of the text data, such as the matched in-domain data used for finetuning.…”
Section: Introductionmentioning
confidence: 99%