Recurrent neural network language model adaptation for multi-genre broadcast speech recognition

Chen, X.; Tan, Tian; Liu, Xunying; Lanchantin, Pierre; Wan, Moquan; Gales, Mark J. F.; Woodland, Philip C.

doi:10.21437/interspeech.2015-696

Cited by 50 publications

(22 citation statements)

References 28 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The results show a very significant drop in perplexity when using RNNLMs but only a modest improvement in word error rate of 0.7%. This is consistent with the experiments reported on the same BBC data in [35]. The main difference, however is that in [35], instead of LM 1 as background language model, another corpus of 1 billion words was used for language modelling, and different topic models including LDA, were used to classify the text into a set of different genres.…”

Section: Multi-genre Language Modellingsupporting

confidence: 85%

The 2015 sheffield system for transcription of Multi-Genre Broadcast media

Saz

Doulaty

Deena

et al. 2015

2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU)

View full text Add to dashboard Cite

We describe the University of Sheffield system for participation in the 2015 Multi-Genre Broadcast (MGB) challenge task of transcribing multi-genre broadcast shows. Transcription was one of four tasks proposed in the MGB challenge, with the aim of advancing the state of the art of automatic speech recognition, speaker diarisation and automatic alignment of subtitles for broadcast media. Four topics are investigated in this work: Data selection techniques for training with unreliable data, automatic speech segmentation of broadcast media shows, acoustic modelling and adaptation in highly variable environments, and language modelling of multigenre shows. The final system operates in multiple passes, using an initial unadapted decoding stage to refine segmentation, followed by three adapted passes: a hybrid DNN pass with input features normalised by speaker-based cepstral normalisation, another hybrid stage with input features normalised by speaker feature-MLLR transformations, and finally a bottleneck-based tandem stage with noise and speaker factorisation. The combination of these three system outputs provides a final error rate of 27.5% on the official development set, consisting of 47 multi-genre shows.

show abstract

Section: Multi-genre Language Modellingsupporting

confidence: 85%

The 2015 sheffield system for transcription of Multi-Genre Broadcast media

Saz

Doulaty

Deena

et al. 2015

2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU)

View full text Add to dashboard Cite

show abstract

“…More recently, neural adapation approaches have been used to adapt a LM to a target domain based on non-linguistic contextual signals, such as the application at the time of the request [11], or learned topic vectors [12,13]. For example, arXiv:2101.03229v1 [cs.CL] 5 Jan 2021 [12] used topic representations obtained from latent dirichlet allocation to adapt an NLM for genres and shows in a multi-genre broadcast transcription task. Domain-adaptation can also be achieved via shallow-fusion, in which an external (contextually constrained) LM is integrated during beam search [14].…”

Section: Previous Workmentioning

confidence: 99%

Domain-Aware Neural Language Models for Speech Recognition

Liu

Gourav

et al. 2021

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

As voice assistants become more ubiquitous, they are increasingly expected to support and perform well on a wide variety of use-cases across different domains. We present a domainaware rescoring framework suitable for achieving domainadaptation during second-pass rescoring in production settings. In our framework, we fine-tune a domain-general neural language model on several domains, and use an LSTMbased domain classification model to select the appropriate domain-adapted model to use for second-pass rescoring. This domain-aware rescoring improves the word error rate by up to 2.4% and slot word error rate by up to 4.1% on three individual domains -shopping, navigation, and music -compared to domain general rescoring. These improvements are obtained while maintaining accuracy for the general use case.

show abstract

“…[17,17,18,19] investigated the use of topic information to build improved n-gram LMs for first-pass decoding. In [20,21], the topic information was modelled in RNNLMs as additional features and used for rescoring. [22] build a session-level LSTM-LM to capture the session-level information.…”

Section: Improving Lm With Long-term Historymentioning

confidence: 99%

LSTM-LM with Long-Term History for First-Pass Decoding in Conversational Speech Recognition

Chen,

Parthasarathy,

Gale

et al. 2020

Preprint

View full text Add to dashboard Cite

LSTM language models (LSTM-LMs) have been proven to be powerful and yielded significant performance improvements over count based n-gram LMs in modern speech recognition systems. Due to its infinite history states and computational load, most previous studies focus on applying LSTM-LMs in the second-pass for rescoring purpose. Recent work shows that it is feasible and computationally affordable to adopt the LSTM-LMs in the first-pass decoding within a dynamic (or tree based) decoder framework. In this work, the LSTM-LM is composed with a WFST decoder on-the-fly for the first-pass decoding. Furthermore, motivated by the long-term history nature of LSTM-LMs, the use of context beyond the current utterance is explored for the first-pass decoding in conversational speech recognition. The context information is captured by the hidden states of LSTM-LMs across utterance and can be used to guide the first-pass search effectively. The experimental results in our internal meeting transcription system show that significant performance improvements can be obtained by incorporating the contextual information with LSTM-LMs in the first-pass decoding, compared to applying the contextual information in the second-pass rescoring.

show abstract

Recurrent neural network language model adaptation for multi-genre broadcast speech recognition

Cited by 50 publications

References 28 publications

The 2015 sheffield system for transcription of Multi-Genre Broadcast media

The 2015 sheffield system for transcription of Multi-Genre Broadcast media

Domain-Aware Neural Language Models for Speech Recognition

LSTM-LM with Long-Term History for First-Pass Decoding in Conversational Speech Recognition

Contact Info

Product

Resources

About