An investigation into speaker informed DNN front-end for LVCSR

Liu, Yulan; Karanasou, Penny; Hain, Thomas

doi:10.1109/icassp.2015.7178782

Cited by 17 publications

(8 citation statements)

References 23 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The non-interpolated training PPL for LM 2 RNNLM was found to be 101.4 using the K-component topic-based finetuning compared to 106.7 for using text LDA adaptation and 119.2 when not using adaptation (baseline), whereas the non-interpolated test PPL was: 157.2, 136.4 and 150.8 respectively, which shows overfitting of the K-Component RNNLM. Our results with the introduction of a domain-specific adaptation layer showed that using an adaptation layer with additive bias adaptation (feature-based adaptation layer), better results are obtained than when using a multiplicative transform (LHN adaptation layer), which is in line with similar observations in acoustic modelling [52]. For LM 1&LM 2 RNNLMs, an additive transform gives a WER of 28.7% whereas using a multiplicative transform leads to a WER of 28.9%.…”

Section: ) Feature-based Adaptation Resultssupporting

confidence: 87%

Recurrent Neural Network Language Model Adaptation for Multi-Genre Broadcast Speech Recognition and Alignment

Deena

Hasan

Doulaty

et al. 2019

IEEE/ACM Trans. Audio Speech Lang. Process.

Self Cite

View full text Add to dashboard Cite

Article:Deena, S. orcid.org/0000-0001-5417-0556, Hasan, M., Doulaty, M. et al. (2 more authors) (2019) Recurrent neural network language model adaptation for multi-genre broadcast speech recognition and alignment. IEEE/ACM Transactions on Audio, Speech and Language Processing, 27 (3).Abstract-Recurrent neural network language models (RNNLMs) generally outperform n-gram language models when used in automatic speech recognition. Adapting RNNLMs to new domains is an open problem and current approaches can be categorised as either feature-based or model-based. In feature-based adaptation, the input to the RNNLM is augmented with auxiliary features whilst model-based adaptation includes model fine-tuning and the introduction of adaptation layer(s) in the network. In this paper, the properties of both types of adaptation are investigated on multi-genre broadcast speech recognition. Existing techniques for both types of adaptation are reviewed and the proposed techniques for model-based adaptation, namely the linear hidden network (LHN) adaptation layer and the K-component adaptive RNNLM, are investigated. Moreover, new features derived from the acoustic domain are investigated for RNNLM adaptation. The contributions of this paper include two hybrid adaptation techniques: the fine-tuning of feature-based RNNLMs and a feature-based adaptation layer. Moreover, the semi-supervised adaptation of RNNLMs using genre information is also proposed. The ASR systems were trained using 700h of multi-genre broadcast speech. The gains obtained when using the RNNLM adaptation techniques proposed in this work are consistent when using RNNLMs trained on an in-domain set of 10M words and on a combination of in-domain and out-of-domain sets of 660M words, with approx. 10% perplexity and 2% relative word error rate improvements on a 28.3h. test set. The best RNNLM adaptation techniques for ASR are also evaluated on a lightly supervised alignment of subtitles task for the same data, where the use of RNNLM adaptation leads to an absolute increase in the F-measure of 0.5%.

show abstract

Section: ) Feature-based Adaptation Resultssupporting

confidence: 87%

Recurrent Neural Network Language Model Adaptation for Multi-Genre Broadcast Speech Recognition and Alignment

Deena

Hasan

Doulaty

et al. 2019

IEEE/ACM Trans. Audio Speech Lang. Process.

Self Cite

View full text Add to dashboard Cite

show abstract

“…Making complex ASR systems available was originally the intent of webASR, and, as such, ASR remains the main task in 3 newly developed systems covering 3 domains. All of them present a state-of-the-art speech transcription system, based on the latest research carried out at the University of Sheffield in topics such as Deep Neural Network (DNN) acoustic modelling [11,12,13,14], distant microphone recognition [15], adaptation to noisy environments [16,17,18], domain adaptation [19,20], Recurrent Neural Network (RNN) language modelling [21], Nbest re-ranking [22,23] and sentence-end detection [24,25].…”

Section: Transcription Systemsmentioning

confidence: 99%

webASR 2 — Improved Cloud Based Speech Technology

Hain

Christian²,

Saz

et al. 2016

Interspeech 2016

Self Cite

View full text Add to dashboard Cite

This paper presents the most recent developments of the webASR service (www.webasr.org), the world's first webbased fully functioning automatic speech recognition platform for scientific use. Initially released in 2008, the functionalities of webASR have recently been expanded with 3 main goals in mind: Facilitate access through a RESTful architecture, that allows for easy use through either the web interface or an API; allow the use of input metadata when available by the user to improve system performance; and increase the coverage of available systems beyond speech recognition. Several new systems for transcription, diarisation, lightly supervised alignment and translation are currently available through webASR. The results in a series of well-known benchmarks (RT'09, IWSLT'12 and MGB'15 evaluations) show how these webASR systems provides state-of-the-art performances across these tasks.

show abstract

“…Finally, domain information derived from the LDA model with K domains is encoded with a K-dimensional one-hot vector called Unique Binary Index Code (UBIC) [14]. UBIC indicates the most likely domain of the utterance using the posterior domain probability.…”

Section: Lda-dnn Adaptationmentioning

confidence: 99%

“…Subspace methods: a speaker/environment subspace is estimated and then neurons' weights or transformations are computed, based on the subspace representation of the speaker/environment. Principle Component Analysis (PCA) based adaptation approach [12], i-Vector based speaker-aware training [13] or speaker-aware DNNs [14] can be considered as subspace methods.…”

Section: Introductionmentioning

confidence: 99%

Latent Dirichlet Allocation based organisation of broadcast media archives for deep neural network adaptation

Doulaty¹,

Saz²,

Ng³

et al. 2015

2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU)

Self Cite

View full text Add to dashboard Cite

This paper presents a new method for the discovery of latent domains in diverse speech data, for the use of adaptation of Deep Neural Networks (DNNs) for Automatic Speech Recognition. Our work focuses on transcription of multi-genre broadcast media, which is often only categorised broadly in terms of high level genres such as sports, news, documentary, etc. However, in terms of acoustic modelling these categories are coarse. Instead, it is expected that a mixture of latent domains can better represent the complex and diverse behaviours within a TV show, and therefore lead to better and more robust performance. We propose a new method, whereby these latent domains are discovered with Latent Dirichlet Allocation, in an unsupervised manner. These are used to adapt DNNs using the Unique Binary Code (UBIC) representation for the LDA domains. Experiments conducted on a set of BBC TV broadcasts, with more than 2,000 shows for training and 47 shows for testing, show that the use of LDA-UBIC DNNs reduces the error up to 13% relative compared to the baseline hybrid DNN models.

show abstract

An investigation into speaker informed DNN front-end for LVCSR

Cited by 17 publications

References 23 publications

Recurrent Neural Network Language Model Adaptation for Multi-Genre Broadcast Speech Recognition and Alignment

Recurrent Neural Network Language Model Adaptation for Multi-Genre Broadcast Speech Recognition and Alignment

webASR 2 — Improved Cloud Based Speech Technology

Latent Dirichlet Allocation based organisation of broadcast media archives for deep neural network adaptation

Contact Info

Product

Resources

About