CUED-RNNLM — An open-source toolkit for efficient training and evaluation of recurrent neural network language models

Chen, X.; Liu, X.; Ye, Qian; Gales, Mark J. F.; Woodland, Philip C.

doi:10.1109/icassp.2016.7472829

Cited by 73 publications

(48 citation statements)

References 23 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In addition, a recurrent neural network language model (RNNLM) [29] was also used to refine the result of the first pass decoding. The CUED-RNNLM Toolkit v1.0 [30] was used to train the RNNLM 1 The splicing indexes per layer can be described as {-1,0,1} {-1,0,1} {-1,0,1,2} {-3,0,3} {-3,0,3} {-6,-3,0} {0} using the notation of [8,11]. 2 The architecture can be described as {-2,-1,0,1,2} {-1,0,1} L {-3,0,3} {-3,0,3} L {-3,0,3} {-3,0,3} L, where L represents an LSTMP layer with 512 cells and 128-dimensional recurrent and non-recurrent projections, using notation of [8,11].…”

Section: Methodsmentioning

confidence: 99%

Phonetic and Graphemic Systems for Multi-Genre Broadcast Transcription

Wang

Chen

Gales

et al. 2018

2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Self Cite

View full text Add to dashboard Cite

State-of-the-art English automatic speech recognition systems typically use phonetic rather than graphemic lexicons. Graphemic systems are known to perform less well for English as the mapping from the written form to the spoken form is complicated. However, in recent years the representational power of deep-learning based acoustic models has improved, raising interest in graphemic acoustic models for English, due to the simplicity of generating the lexicon. In this paper, phonetic and graphemic models are compared for an English Multi-Genre Broadcast transcription task. A range of acoustic models based on lattice-free MMI training are constructed using phonetic and graphemic lexicons. For this task, it is found that having a long-span temporal history reduces the difference in performance between the two forms of models. In addition, system combination is examined, using parameter smoothing and hypothesis combination. As the combination approaches become more complicated the difference between the phonetic and graphemic systems further decreases. Finally, for all configurations examined the combination of phonetic and graphemic systems yields consistent gains.

show abstract

Section: Methodsmentioning

confidence: 99%

Phonetic and Graphemic Systems for Multi-Genre Broadcast Transcription

Wang

Chen

Gales

et al. 2018

2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Self Cite

View full text Add to dashboard Cite

show abstract

“…Our RNN-LMs are trained and evaluated using the CUED-RNNLM toolkit [58]. Our RNN-LM configuration has several distinctive features, as described below.…”

Section: Rnn-lm Setupmentioning

confidence: 99%

Toward Human Parity in Conversational Speech Recognition

Xiong

Droppo

Huang

et al. 2017

IEEE/ACM Trans. Audio Speech Lang. Process.

374

328

View full text Add to dashboard Cite

Conversational speech recognition has served as a flagship speech recognition task since the release of the Switchboard corpus in the 1990s. In this paper, we measure the human error rate on the widely used NIST 2000 test set, and find that our latest automated system has reached human parity. The error rate of professional transcribers is 5.9% for the Switchboard portion of the data, in which newly acquainted pairs of people discuss an assigned topic, and 11.3% for the CallHome portion where friends and family members have open-ended conversations. In both cases, our automated system establishes a new state of the art, and edges past the human benchmark, achieving error rates of 5.8% and 11.0%, respectively. The key to our system's performance is the use of various convolutional and LSTM acoustic model architectures, combined with a novel spatial smoothing method and lattice-free MMI acoustic training, multiple recurrent neural network language modeling approaches, and a systematic use of system combination.

show abstract

“…Kneser-Ney smoothing is used for building the 4-gram LM using the corresponding options provided in SRILM. The RNNLMs were trained used a modified version of the CUED-RNNLM toolkit [50]. For training the baseline RNNLM, we used the full LM 1&LM 2 text, together with a 60k vocabulary for the input word list and a 50k vocabulary for the output word list.…”

Section: A Asr Results 1) Experimental Setupmentioning

confidence: 99%

Recurrent Neural Network Language Model Adaptation for Multi-Genre Broadcast Speech Recognition and Alignment

Deena

Hasan

Doulaty

et al. 2019

IEEE/ACM Trans. Audio Speech Lang. Process.

View full text Add to dashboard Cite

Article:Deena, S. orcid.org/0000-0001-5417-0556, Hasan, M., Doulaty, M. et al. (2 more authors) (2019) Recurrent neural network language model adaptation for multi-genre broadcast speech recognition and alignment. IEEE/ACM Transactions on Audio, Speech and Language Processing, 27 (3).Abstract-Recurrent neural network language models (RNNLMs) generally outperform n-gram language models when used in automatic speech recognition. Adapting RNNLMs to new domains is an open problem and current approaches can be categorised as either feature-based or model-based. In feature-based adaptation, the input to the RNNLM is augmented with auxiliary features whilst model-based adaptation includes model fine-tuning and the introduction of adaptation layer(s) in the network. In this paper, the properties of both types of adaptation are investigated on multi-genre broadcast speech recognition. Existing techniques for both types of adaptation are reviewed and the proposed techniques for model-based adaptation, namely the linear hidden network (LHN) adaptation layer and the K-component adaptive RNNLM, are investigated. Moreover, new features derived from the acoustic domain are investigated for RNNLM adaptation. The contributions of this paper include two hybrid adaptation techniques: the fine-tuning of feature-based RNNLMs and a feature-based adaptation layer. Moreover, the semi-supervised adaptation of RNNLMs using genre information is also proposed. The ASR systems were trained using 700h of multi-genre broadcast speech. The gains obtained when using the RNNLM adaptation techniques proposed in this work are consistent when using RNNLMs trained on an in-domain set of 10M words and on a combination of in-domain and out-of-domain sets of 660M words, with approx. 10% perplexity and 2% relative word error rate improvements on a 28.3h. test set. The best RNNLM adaptation techniques for ASR are also evaluated on a lightly supervised alignment of subtitles task for the same data, where the use of RNNLM adaptation leads to an absolute increase in the F-measure of 0.5%.

show abstract

CUED-RNNLM — An open-source toolkit for efficient training and evaluation of recurrent neural network language models

Cited by 73 publications

References 23 publications

Phonetic and Graphemic Systems for Multi-Genre Broadcast Transcription

Phonetic and Graphemic Systems for Multi-Genre Broadcast Transcription

Toward Human Parity in Conversational Speech Recognition

Recurrent Neural Network Language Model Adaptation for Multi-Genre Broadcast Speech Recognition and Alignment

Contact Info

Product

Resources

About