Efficient One-Pass Decoding with NNLM for Speech Recognition

Shi, Yongzhe; Zhang, Weiqiang; Cai, Meng; Liu, Jia

doi:10.1109/lsp.2014.2303136

Cited by 25 publications

(25 citation statements)

References 9 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…One important practical issue associated with RNNLMs is the computational cost incurred in model training. This limits the quantity of data and their possible application areas, and therefore has drawn increasing research interest in recent years [2,11,12,5,13,10,14,15].…”

Section: Introductionmentioning

confidence: 99%

“…One technique that can be used to improve the testing speed is introducing the variance of the normalization term into the conventional cross entropy based objective function. This has been applied to training of feedforward NNLMs, class based [13,10,14] and full output RNNLMs [16]. By minimizing the variance of the normalization term during training, the normalization term at the output layer can be ignored during testing time thus gaining significant improvements in speed.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Recurrent neural network language model training with noise contrastive estimation for speech recognition

Chen

Liu

Gales

et al. 2015

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

In recent years recurrent neural network language models (RNNLMs) have been successfully applied to a range of tasks including speech recognition. However, an important issue that limits the quantity of data used, and their possible application areas, is the computational cost in training. A significant part of this cost is associated with the softmax function at the output layer, as this requires a normalization term to be explicitly calculated. This impacts both the training and testing speed, especially when a large output vocabulary is used. To address this problem, noise contrastive estimation (NCE) is explored in RNNLM training. NCE does not require the above normalization during both training and testing. It is insensitive to the output layer size. On a large vocabulary conversational telephone speech recognition task, a doubling in training speed on a GPU and a 56 times speed up in test time evaluation on a CPU were obtained.

show abstract

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Recurrent neural network language model training with noise contrastive estimation for speech recognition

Chen

Liu

Gales

et al. 2015

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

show abstract

“…Another type of solution to speedup evaluation of NNLMs has been proposed both in [12] (variance regularisation) and [10] (self-norm). The variance of the softmax log normalisation is added into the objective function for optimisation.…”

Section: F-rnnlm With Variance Regularisationmentioning

confidence: 99%

“…The second method allows the RNNLM to be used without softmax normalisation during testing, by training with an extra variance regularisation term in the training objective function. This approach was applied on feedforward NNLMs and class-based RNNLMs in previous work [12,10,13]. It can also be applied to full output layer RNNLMs.…”

Section: Introductionmentioning

confidence: 99%

Improving the training and evaluation efficiency of recurrent neural network language models

Chen

Liu

Gales

et al. 2015

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

Recurrent neural network language models (RNNLMs) are becoming increasingly popular for speech recognition. Previously, we have shown that RNNLMs with a full (non-classed) output layer (F-RNNLMs) can be trained efficiently using a GPU giving a large reduction in training time over conventional class-based models (C-RNNLMs) on a standard CPU. However, since test-time RNNLM evaluation is often performed entirely on a CPU, standard F-RNNLMs are inefficient since the entire output layer needs to be calculated for normalisation. In this paper, it is demonstrated that C-RNNLMs can be efficiently trained on a GPU, using our spliced sentence bunch technique which allows good CPU test-time performance (42× speedup over F-RNNLM). Furthermore, the performance of different classing approaches is investigated. We also examine the use of variance regularisation of the softmax denominator for F-RNNLMs and show that it allows F-RNNLMs to be efficiently used in test (56× speedup on a CPU). Finally the use of two GPUs for F-RNNLM training using pipelining is described and shown to give a reduction in training time over a single GPU by a factor of 1.6×.

show abstract

“…The language model of S1-10 is a word trigram language model, while S11 utilizes a feed-forward neural network language model with variance regularizations [19]. Besids, S11 employs our own decoder [19] while other systems employ the Kaldi decoder. The TWV results of our KWS systems after KST normalization are listed in Table 1.…”

Section: Kws Systemsmentioning

confidence: 99%

Improved system fusion for keyword search

Cai

Lü

et al. 2015

2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU)

Self Cite

View full text Add to dashboard Cite

It has been demonstrated that system fusion can significantly improve the performance of keyword search. In this paper, we compare the performance of several widely-used arithmeticbased fusion methods using different normalization pipeline and try to find the best pipeline. A novel arithmetic-based fusion method is proposed in this work. The method supplies a more effective way to incorporate the number of systems which have non-zero scores for a detection. When tested on the development test dataset of the OpenKWS15 Evaluation, the proposed method achieves the highest maximum termweighted value (MTWV) and actual term-weighted value (ATWV) among all other arithmetic-based fusion methods. Usually, discriminative fusion methods employing classifiers can outperform arithmetic-based fusion methods. A DNNbased fusion method is explored in this work. After wordburst information is added, the DNN-based fusion method outperforms all other methods. In addition, it is notable that our arithmetic-based method achieves the same MTWV as the DNN-based method.

show abstract

Efficient One-Pass Decoding with NNLM for Speech Recognition

Cited by 25 publications

References 9 publications

Recurrent neural network language model training with noise contrastive estimation for speech recognition

Recurrent neural network language model training with noise contrastive estimation for speech recognition

Improving the training and evaluation efficiency of recurrent neural network language models

Improved system fusion for keyword search

Contact Info

Product

Resources

About