Semi-Supervised Acoustic Model Training by Discriminative Data Selection From Multiple ASR Systems’ Hypotheses

Li, Sheng; Akita, Yuya; Kawahara, Tatsuya

doi:10.1109/taslp.2016.2562505

Cited by 12 publications

(8 citation statements)

References 60 publications

(53 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Yet another possibility to improve the semi-supervised training is to use the multi-system transcripts from system combination [13], or for the 'agreement analysis' [14]. Also having the 'captions' available can be helpful [15,16].…”

Section: Introductionmentioning

confidence: 99%

Semi-Supervised DNN Training with Word Selection for ASR

2017

View full text Add to dashboard Cite

Not all the questions related to the semi-supervised training of hybrid ASR system with DNN acoustic model were already deeply investigated. In this paper, we focus on the question of the granularity of confidences (per-sentence, per-word, perframe), the question of how the data should be used (dataselection by masks, or in mini-batch SGD with confidences as weights). Then, we propose to re-tune the system with the manually transcribed data, both with the 'frame CE' training and 'sMBR' training. Our preferred semi-supervised recipe which is both simple and efficient is following: we select words according to the word accuracy we obtain on the development set. Such recipe, which does not rely on a grid-search of the training hyperparameter, generalized well for: Babel Vietnamese (transcribed 11h, untranscribed 74h), Babel Bengali (transcribed 11h, untranscribed 58h) and our custom Switchboard setup (transcribed 14h, untranscribed 95h). We obtained the absolute WER improvements 2.5% for Vietnamese, 2.3% for Bengali and 3.2% for Switchboard.

show abstract

Section: Introductionmentioning

confidence: 99%

Semi-Supervised DNN Training with Word Selection for ASR

2017

View full text Add to dashboard Cite

show abstract

“…small amount of training data as in MALORCA) [23], [24], [25], [26]. For acoustic modeling, researchers have applied various data-selection schemes to utilize the additional unlabeled data [27], [28], [29], [30], [31], [32]. In this paper, we apply a technique built specifically to account for semantics of the ATM domain [27].…”

Section: B Supervised and Unsupervised Learningmentioning

confidence: 99%

Semi-supervised Adaptation of Assistant Based Speech Recognition Models for different Approach Areas

Kleinert

Helmke

Siol

et al. 2018

2018 IEEE/AIAA 37th Digital Avionics Systems Conference (DASC)

View full text Add to dashboard Cite

Air Navigation Service Providers (ANSPs) replace paper flight strips through different digital solutions. The instructed commands from an air traffic controller (ATCos) are then available in computer readable form. However, those systems require manual controller inputs, i.e. ATCos' workload increases. The Active Listening Assistant (AcListant®) project has shown that Assistant Based Speech Recognition (ABSR) is a potential solution to reduce this additional workload. However, the development of an ABSR application for a specific targetdomain usually requires a large amount of manually transcribed audio data in order to achieve task-sufficient recognition accuracies. MALORCA project developed an initial basic ABSR system and semi-automatically tailored its recognition models for both Prague and Vienna approaches by machine learning from automatically transcribed audio data. Command recognition error rates were reduced from 7.9% to under 0.6% for Prague and from 18.9% to 3.2% for Vienna.

show abstract

“…More complex data selection methods were also proposed in SSL data selection. In [8], multiple ASR systems were trained to automatically transcribe the speech data, and a cascade of the conditional random field models were used to combine the ASR hypotheses from different systems and judge the reliability of the automatically transcribed data. [9] proposed the global entropy reduction maximization (GERM) method.…”

Section: Introductionmentioning

confidence: 99%

Acoustic Model Bootstrapping Using Semi-Supervised Learning

Chen¹,

Leutnant

2019

Interspeech 2019

View full text Add to dashboard Cite

This work aims at bootstrapping acoustic model training for automatic speech recognition with small amounts of humanlabeled speech data and large amounts of machine-labeled speech data.Semi-supervised learning is investigated to select the machine-transcribed training samples.Two semi-supervised learning methods are proposed: one is the local-global uncertainty based method which introduces both the local uncertainty from the current utterance and the global uncertainty from the whole data pool into the data selection; the other is the margin based data selection, which selects the utterances near to the decision boundary through language model tuning. The experimental results based on a Japanese far-field automatic speech recognition system indicate that the acoustic model trained by automatically transcribed speech data achieve about 17% relative gain when in-domain human annotated data was not available for initialization. While 3.7% relative gain was obtained when the initial acoustic model was trained by small amount of in-domain data.

show abstract

Semi-Supervised Acoustic Model Training by Discriminative Data Selection From Multiple ASR Systems’ Hypotheses

Cited by 12 publications

References 60 publications

Semi-Supervised DNN Training with Word Selection for ASR

Semi-Supervised DNN Training with Word Selection for ASR

Semi-supervised Adaptation of Assistant Based Speech Recognition Models for different Approach Areas

Acoustic Model Bootstrapping Using Semi-Supervised Learning

Contact Info

Product

Resources

About