Improving broadcast news transcription by lightly supervised discriminative training

Chan, Ho Yin; Woodland, Philip C.

doi:10.1109/icassp.2004.1326091

Cited by 75 publications

(59 citation statements)

References 9 publications

(16 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Unsupervised training is similar to lightly supervised training except that a general recognition language model is used, rather than the biased language model for lightly supervised training. This reduces the accuracy of the transcriptions [23], but has been successfully applied to a range of tasks [56,110]. However, the gains from unsupervised approaches can decrease dramatically when discriminative training such as MPE is used.…”

Section: Lightly Supervised and Unsupervised Trainingmentioning

confidence: 99%

See 1 more Smart Citation

The Application of Hidden Markov Models in Speech Recognition

Gales

Young

2007

FNT in Signal Processing

427

155

View full text Add to dashboard Cite

Hidden Markov Models (HMMs) provide a simple and effective framework for modelling time-varying spectral vector sequences. As a consequence, almost all present day large vocabulary continuous speech recognition (LVCSR) systems are based on HMMs.Whereas the basic principles underlying HMM-based LVCSR are rather straightforward, the approximations and simplifying assumptions involved in a direct implementation of these principles would result in a system which has poor accuracy and unacceptable sensitivity to changes in operating environment. Thus, the practical application of HMMs in modern systems involves considerable sophistication.The aim of this review is first to present the core architecture of a HMM-based LVCSR system and then describe the various refinements which are needed to achieve state-of-the-art performance. These refinements include feature projection, improved covariance modelling, discriminative parameter estimation, adaptation and normalisation, noise compensation and multi-pass system combination. The review concludes with a case study of LVCSR for Broadcast News and Conversation transcription in order to illustrate the techniques described.

show abstract

Section: Lightly Supervised and Unsupervised Trainingmentioning

confidence: 99%

“…These transcriptions are known to be error-full and thus not suitable for direct use when training detailed acoustic models. However, a number of lightly supervised training techniques have been developed to overcome this [23,94,128].…”

Section: Lightly Supervised and Unsupervised Trainingmentioning

confidence: 99%

The Application of Hidden Markov Models in Speech Recognition

Gales

Young

2007

FNT in Signal Processing

427

155

View full text Add to dashboard Cite

show abstract

“…The conventional filtering method, however, has a drawback that it significantly reduces the amount of usable training data. Moreover, it is presumed that the unmatched or less confident segments of the data are more useful than the matched segments because the baseline system failed to recognize them and may be improved with additional training [12]. Recent work by Long et al [14] proposed methods to improve the filtering by considering the phone error rate and confidence measures.…”

Section: Introductionmentioning

confidence: 99%

“…In order to increase the training data for an acoustic model, a scheme of lightly supervised training, which does not require faithful transcripts but exploits available verbatim texts, has been explored for broadcast news [10]- [12] Manuscript received February 6, 2015. Manuscript revised March 13, 2015.…”

Section: Introductionmentioning

confidence: 99%

“…Instead of simple sequence matching [10]- [12] and heuristic measure-based selection [14], [15], in this work, we propose to train a set of dedicated classifiers to select the usable data for acoustic model training. Given an aligned sequence of the ASR hypothesis and the closed caption text (and also reference text in the training phase), a set of classifiers is trained based on a discriminative model to select between the ASR result and the closed caption text, or reject both if they are not matched.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Automatic Lecture Transcription Based on Discriminative Data Selection for Lightly Supervised Acoustic Model Training

Akita

Kawahara

2015

IEICE Trans. Inf. & Syst.

View full text Add to dashboard Cite

SUMMARYThe paper addresses a scheme of lightly supervised training of an acoustic model, which exploits a large amount of data with closed caption texts but not faithful transcripts. In the proposed scheme, a sequence of the closed caption text and that of the ASR hypothesis by the baseline system are aligned. Then, a set of dedicated classifiers is designed and trained to select the correct one among them or reject both. It is demonstrated that the classifiers can effectively filter the usable data for acoustic model training. The scheme realizes automatic training of the acoustic model with an increased amount of data. A significant improvement in the ASR accuracy is achieved from the baseline system and also in comparison with the conventional method of lightly supervised training based on simple matching.

show abstract