Cambridge university transcription systems for the multi-genre broadcast challenge

Woodland, Philip C.; Liu, X.; Ye, Qian; Zhang, C.; Gales, Mark J. F.; Karanasou, Penny; Lanchantin, Pierre; Wang, L.

doi:10.1109/asru.2015.7404856

Cited by 35 publications

(61 citation statements)

References 37 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…• University of Cambridge (CU; (mi.eng.cam.ac.uk)) [25]: Primarily HTK-based hybrid DNN and tandem systems via joint decoding. Trained on 700 hrs (PMER=30%).…”

Section: Submitted Systems and Resultsmentioning

confidence: 99%

The MGB challenge: Evaluating multi-genre broadcast media recognition

Bell

Gales

Hain

et al. 2015

2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU)

125

151

View full text Add to dashboard Cite

This paper describes the Multi-Genre Broadcast (MGB) Challenge at ASRU 2015, an evaluation focused on speech recognition, speaker diarization, and "lightly supervised" alignment of BBC TV recordings. The challenge training data covered the whole range of seven weeks BBC TV output across four channels, resulting in about 1,600 hours of broadcast audio. In addition several hundred million words of BBC subtitle text was provided for language modelling. A novel aspect of the evaluation was the exploration of speech recognition and speaker diarization in a longitudinal setting -i.e. recognition of several episodes of the same show, and speaker diarization across these episodes, linking speakers. The longitudinal tasks also offered the opportunity for systems to make use of supplied metadata including show title, genre tag, and date/time of transmission. This paper describes the task data and evaluation process used in the MGB challenge, and summarises the results obtained.

show abstract

“…• University of Cambridge (CU; (mi.eng.cam.ac.uk)) [25]: Primarily HTK-based hybrid DNN and tandem systems via joint decoding. Trained on 700 hrs (PMER=30%).…”

Section: Submitted Systems and Resultsmentioning

confidence: 99%

The MGB challenge: Evaluating multi-genre broadcast media recognition

Bell

Gales

Hain

et al. 2015

2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU)

125

151

View full text Add to dashboard Cite

show abstract

“…While for TDNNs, only the 40 dimensional log-Mel filter bank features were considered. For all experiments, the input features were normalised at the utterance level for mean and at the show-segment level for variance [25].…”

Section: Methodsmentioning

confidence: 99%

“…To evaluate the generalisation performance of the trained models, a 158k word vocabulary trigram LM was used to decode the validation and test set. 1 Note that most results in [25] use a larger 700h training set, stronger language models and other setup differences. Training configuration for SGD: The best results with SGD were achieved through annealing of the learning rates at subsequent epochs.…”

Section: Methodsmentioning

confidence: 99%

Combining Natural Gradient with Hessian Free Methods for Sequence Training

Haider

Woodland

2018

Interspeech 2018

View full text Add to dashboard Cite

This paper presents a new optimisation approach to train Deep Neural Networks (DNNs) with discriminative sequence criteria. At each iteration, the method combines information from the Natural Gradient (NG) direction with local curvature information of the error surface that enables better paths on the parameter manifold to be traversed. The method has been applied within a Hessian Free (HF) style optimisation framework to sequence train both standard fully-connected DNNs and Time Delay Neural Networks as speech recognition acoustic models. The efficacy of the method is shown using experiments on a Multi-Genre Broadcast (MGB) transcription task and neural networks using sigmoid and ReLU activation functions have been investigated. It is shown that for the same number of updates this proposed approach achieves larger reductions in the word error rate (WER) than both NG and HF, and also leads to a lower WER than standard stochastic gradient descent.

show abstract

“…The development data was used as the evaluation set in order to provide fair comparison with previous work [4,22,6]. For language model experiments, the LM 2 data was partitioned into a training and development set by selecting 90% of text for each programme for training and the remaining 10% for development, after shuffling the lines for each programme.…”

Section: Multi-genre Broadcast Challenge Datamentioning

confidence: 99%

Semi-Supervised Adaptation of RNNLMs by Fine-Tuning with Domain-Specific Auxiliary Features

Deena

Madhyastha

et al. 2017

Interspeech 2017

View full text Add to dashboard Cite

Recurrent neural network language models (RNNLMs) can be augmented with auxiliary features, which can provide an extra modality on top of the words. It has been found that RNNLMs perform best when trained on a large corpus of generic text and then fine-tuned on text corresponding to the sub-domain for which it is to be applied. However, in many cases the auxiliary features are available for the sub-domain text but not for the generic text. In such cases, semi-supervised techniques can be used to infer such features for the generic text data such that the RNNLM can be trained and then fine-tuned on the available in-domain data with corresponding auxiliary features.In this paper, several novel approaches are investigated for dealing with the semi-supervised adaptation of RNNLMs with auxiliary features as input. These approaches include: using zero features during training to mask the weights of the feature sub-network; adding the feature sub-network only at the time of fine-tuning; deriving the features using a parametric model and; back-propagating to infer the features on the generic text. These approaches are investigated and results are reported both in terms of PPL and WER on a multi-genre broadcast ASR task.

show abstract

Cambridge university transcription systems for the multi-genre broadcast challenge

Cited by 35 publications

References 37 publications

The MGB challenge: Evaluating multi-genre broadcast media recognition

The MGB challenge: Evaluating multi-genre broadcast media recognition

Combining Natural Gradient with Hessian Free Methods for Sequence Training

Semi-Supervised Adaptation of RNNLMs by Fine-Tuning with Domain-Specific Auxiliary Features

Contact Info

Product

Resources

About