Recurrent neural network language model training with noise contrastive estimation for speech recognition

Chen, X.; Liu, X.; Gales, Mark J. F.; Woodland, Philip C.

doi:10.1109/icassp.2015.7179005

Cited by 79 publications

(92 citation statements)

References 16 publications

(33 reference statements)

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Noise contrastive estimation (NCE) is another sampling-based technique (Hyvärinen, 2010;Mnih and Teh, 2012;Chen et al, 2015). Contrary to target sampling, it does not maximize the training data likelihood directly.…”

Section: Noise Contrastive Estimationmentioning

confidence: 99%

Strategies for Training Large Vocabulary Neural Language Models

Chen

Grangier

Auli

2016

Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

View full text Add to dashboard Cite

Training neural network language models over large vocabularies is computationally costly compared to count-based models such as Kneser-Ney. We present a systematic comparison of neural strategies to represent and train large vocabularies, including softmax, hierarchical softmax, target sampling, noise contrastive estimation and self normalization. We extend self normalization to be a proper estimator of likelihood and introduce an efficient variant of softmax. We evaluate each method on three popular benchmarks, examining performance on rare words, the speed/accuracy trade-off and complementarity to Kneser-Ney.

show abstract

Section: Noise Contrastive Estimationmentioning

confidence: 99%

Strategies for Training Large Vocabulary Neural Language Models

Chen

Grangier

Auli

2016

Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

View full text Add to dashboard Cite

show abstract

“…Therefore, our decision to implement all methods in a shared codebase, which ensured a fair comparison of model quality, also prevented us from providing a meaningful evaluation of training speed, as the code and architecture were implicitly optimized to favour the most demanding method (MLE). Fortunately, there is ample evidence that NCE can provide large improvements to per-batch training speeds for NNLMs, ranging from a 2× speed-up for 20K-word vocabularies on a GPU (Chen et al, 2015) to more than 10× for 70K-word vocabularies on a CPU (Vaswani et al, 2013). Meanwhile, our experiments show that 1.2M batches are sufficient for MLE, NCE-T and NCE-M to achieve very high quality; that is, none of these methods made use of early stopping during their main training pass.…”

Section: Impact On Speedmentioning

confidence: 99%

An Empirical Evaluation of Noise Contrastive Estimation for the Neural Network Joint Model of Translation

Cherry

2016

Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Langua

View full text Add to dashboard Cite

The neural network joint model of translation or NNJM (Devlin et al., 2014) combines source and target context to produce a powerful translation feature. However, its softmax layer necessitates a sum over the entire output vocabulary, which results in very slow maximum likelihood (MLE) training. This has led some groups to train using Noise Contrastive Estimation (NCE), which side-steps this sum. We carry out the first direct comparison of MLE and NCE training objectives for the NNJM, showing that NCE is significantly outperformed by MLE on large-scale ArabicEnglish and Chinese-English translation tasks. We also show that this drop can be avoided by using a recently proposed translation noise distribution.

show abstract

“…n-gram LMs dominated ASR for decades until RNNLMs [1] were introduced and found to give significant gains in performance. n-gram LM and RNNLM contributions are complementary and state-of-the-art ASR systems involve interpolation between the two types of models [1,2,3,4,5,6,7].…”

Section: Introductionmentioning

confidence: 99%

“…RNNLMs trained on a text corpus provide an implicit modelling of such contextual factors. It has been found that feature-based adaptation of RNNLMs by augmenting the input with domain-specific auxiliary features provide significant improvements in both perplexity (PPL) and word error rate (WER) [8,2,9,10,4,6,11]. Such features, however, can also include acoustic embeddings [12,13] derived from audio, which might be available for only a subset of the text data, such as the matched in-domain data used for finetuning.…”

Section: Introductionmentioning

confidence: 99%

Semi-Supervised Adaptation of RNNLMs by Fine-Tuning with Domain-Specific Auxiliary Features

Deena

Madhyastha

et al. 2017

Interspeech 2017

View full text Add to dashboard Cite

Recurrent neural network language models (RNNLMs) can be augmented with auxiliary features, which can provide an extra modality on top of the words. It has been found that RNNLMs perform best when trained on a large corpus of generic text and then fine-tuned on text corresponding to the sub-domain for which it is to be applied. However, in many cases the auxiliary features are available for the sub-domain text but not for the generic text. In such cases, semi-supervised techniques can be used to infer such features for the generic text data such that the RNNLM can be trained and then fine-tuned on the available in-domain data with corresponding auxiliary features.In this paper, several novel approaches are investigated for dealing with the semi-supervised adaptation of RNNLMs with auxiliary features as input. These approaches include: using zero features during training to mask the weights of the feature sub-network; adding the feature sub-network only at the time of fine-tuning; deriving the features using a parametric model and; back-propagating to infer the features on the generic text. These approaches are investigated and results are reported both in terms of PPL and WER on a multi-genre broadcast ASR task.

show abstract

Recurrent neural network language model training with noise contrastive estimation for speech recognition

Cited by 79 publications

References 16 publications

Strategies for Training Large Vocabulary Neural Language Models

Strategies for Training Large Vocabulary Neural Language Models

An Empirical Evaluation of Noise Contrastive Estimation for the Neural Network Joint Model of Translation

Semi-Supervised Adaptation of RNNLMs by Fine-Tuning with Domain-Specific Auxiliary Features

Contact Info

Product

Resources

About