2006 IEEE International Conference on Acoustics Speed and Signal Processing Proceedings
DOI: 10.1109/icassp.2006.1660839
|View full text |Cite
|
Sign up to set email alerts
|

Unsupervised Training on Large Amounts of Broadcast News Data

Abstract: This paper presents our recent effort that aims at improving our Arabic Broadcast News (BN) recognition system by using thousands of hours of un-transcribed Arabic audio in the way of unsupervised training. Unsupervised training is first carried out on the 1,900-hour English Topic Detection and Tracking (TDT) data and is compared with the lightly-supervised training method that we have used for the DARPA EARS evaluations. The comparison shows that unsupervised training produces a 21.7% relative reduction in wo… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
57
0

Publication Types

Select...
7
1
1

Relationship

0
9

Authors

Journals

citations
Cited by 48 publications
(57 citation statements)
references
References 6 publications
0
57
0
Order By: Relevance
“…Semi-supervised training has been effectively used to train acoustic models in several languages and conditions [32,33,34,35,36]. This section discusses the application of these approaches to low-resource settings.…”
Section: Semi-supervised Trainingmentioning
confidence: 99%
“…Semi-supervised training has been effectively used to train acoustic models in several languages and conditions [32,33,34,35,36]. This section discusses the application of these approaches to low-resource settings.…”
Section: Semi-supervised Trainingmentioning
confidence: 99%
“…In low resource scenarios, we seek multi-lingual and semi-supervised methods to leverage more easily acquired high-resource or untranscribed speech to improve our ASR performance with minimal cost. Two avenues were explored in the workshop: (i) a multi-lingual corpus was used to train a data-driven, language-invariant front-end for low-resource recognition; and (ii) untranscribed speech audio was automatically transcribed and used to augment to the labeled data for training, a procedure known as self-training [33,34]. For (i), discriminative deep neural network (DNN) pre-training [35] was performed on a multilingual corpus consisting of 31 hours of German/Spanish and only a single hour of English.…”
Section: Data-driven Front-ends and Selective Self Supervisionmentioning
confidence: 99%
“…In addition unsupervised [9] or lightlysupervised [10] training is another type of popular strategy which could enlarge the size of target language data quickly and cheaply [11] [12]. The multilingual or cross-lingual approaches borrow data from the V2 (transcribed non-target language data), and the unsupervised training usually develop technologies to borrow data from the untranscribed target language data, i.e.…”
Section: Target_langmentioning
confidence: 99%