Unsupervised Training on Large Amounts of Broadcast News Data

Ma, Junxia; Matsoukas, Spyros; Kimball, Owen; Schwartz, Richard

doi:10.1109/icassp.2006.1660839

Cited by 48 publications

(57 citation statements)

References 6 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Semi-supervised training has been effectively used to train acoustic models in several languages and conditions [32,33,34,35,36]. This section discusses the application of these approaches to low-resource settings.…”

Section: Semi-supervised Trainingmentioning

confidence: 99%

Deep neural network features and semi-supervised training for low resource speech recognition

Thomas

Seltzer

Church³

et al. 2013

2013 IEEE International Conference on Acoustics, Speech and Signal Processing

138

100

View full text Add to dashboard Cite

We propose a new technique for training deep neural networks (DNNs) as data-driven feature front-ends for large vocabulary continuous speech recognition (LVCSR) in low resource settings. To circumvent the lack of sufficient training data for acoustic modeling in these scenarios, we use transcribed multilingual data and semi-supervised training to build the proposed feature front-ends. In our experiments, the proposed features provide an absolute improvement of 16% in a low-resource LVCSR setting with only one hour of in-domain training data. While close to three-fourths of these gains come from DNN-based features, the remaining are from semi-supervised training.

show abstract

Section: Semi-supervised Trainingmentioning

confidence: 99%

Deep neural network features and semi-supervised training for low resource speech recognition

Thomas

Seltzer

Church³

et al. 2013

2013 IEEE International Conference on Acoustics, Speech and Signal Processing

138

100

View full text Add to dashboard Cite

show abstract

“…In low resource scenarios, we seek multi-lingual and semi-supervised methods to leverage more easily acquired high-resource or untranscribed speech to improve our ASR performance with minimal cost. Two avenues were explored in the workshop: (i) a multi-lingual corpus was used to train a data-driven, language-invariant front-end for low-resource recognition; and (ii) untranscribed speech audio was automatically transcribed and used to augment to the labeled data for training, a procedure known as self-training [33,34]. For (i), discriminative deep neural network (DNN) pre-training [35] was performed on a multilingual corpus consisting of 31 hours of German/Spanish and only a single hour of English.…”

Section: Data-driven Front-ends and Selective Self Supervisionmentioning

confidence: 99%

A summary of the 2012 JHU CLSP workshop on zero resource speech technologies and models of early language acquisition

Jansen¹,

Dupoux

Goldwater

et al. 2013

2013 IEEE International Conference on Acoustics, Speech and Signal Processing

112

View full text Add to dashboard Cite

We summarize the accomplishments of a multi-disciplinary workshop exploring the computational and scientific issues surrounding zero resource (unsupervised) speech technologies and related models of early language acquisition. Centered around the tasks of phonetic and lexical discovery, we consider unified evaluation metrics, present two new approaches for improving speaker independence in the absence of supervision, and evaluate the application of Bayesian word segmentation algorithms to automatic subword unit tokenizations. Finally, we present two strategies for integrating zero resource techniques into supervised settings, demonstrating the potential of unsupervised methods to improve mainstream technologies.

show abstract

“…In addition unsupervised [9] or lightlysupervised [10] training is another type of popular strategy which could enlarge the size of target language data quickly and cheaply [11] [12]. The multilingual or cross-lingual approaches borrow data from the V2 (transcribed non-target language data), and the unsupervised training usually develop technologies to borrow data from the untranscribed target language data, i.e.…”

Section: Target_langmentioning

confidence: 99%

Combination of data borrowing strategies for low-resource LVCSR

Liu

2013

2013 IEEE Workshop on Automatic Speech Recognition and Understanding

View full text Add to dashboard Cite

Large vocabulary continuous speech recognition (LVCSR) is particularly difficult for low-resource languages, where only very limited manually transcribed data are available. However, it is often feasible to obtain large amount of untranscribed data of the low-resource target language or sufficient transcribed data of some non-target languages. Borrowing data from these additional sources to help LVC-SR for low-resource language becomes an important research direction. This paper presents an integrated data borrowing framework in this scenario. Three data borrowing approaches were first investigated in detail, including feature, model and data corpus. They borrow data at different levels from additional sources, and all get substantial performance improvements. As these strategies work independently, the obtained gains are likely additive. The three strategies are then combined to form an integrated data borrowing framework. Experiments showed that with the integrated data borrowing framework, significant improvement of more than 10% absolute WER reduction over a conventional baseline was obtained. In particular, the gain under the extreme limited low-resource scenario is 16%.

show abstract

Unsupervised Training on Large Amounts of Broadcast News Data

Cited by 48 publications

References 6 publications

Deep neural network features and semi-supervised training for low resource speech recognition

Deep neural network features and semi-supervised training for low resource speech recognition

A summary of the 2012 JHU CLSP workshop on zero resource speech technologies and models of early language acquisition

Combination of data borrowing strategies for low-resource LVCSR

Contact Info

Product

Resources

About