Interspeech 2017 2017
DOI: 10.21437/interspeech.2017-1385
|View full text |Cite
|
Sign up to set email alerts
|

Semi-Supervised DNN Training with Word Selection for ASR

Abstract: Not all the questions related to the semi-supervised training of hybrid ASR system with DNN acoustic model were already deeply investigated. In this paper, we focus on the question of the granularity of confidences (per-sentence, per-word, perframe), the question of how the data should be used (dataselection by masks, or in mini-batch SGD with confidences as weights). Then, we propose to re-tune the system with the manually transcribed data, both with the 'frame CE' training and 'sMBR' training. Our preferred … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
23
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
6
1

Relationship

1
6

Authors

Journals

citations
Cited by 21 publications
(24 citation statements)
references
References 21 publications
1
23
0
Order By: Relevance
“…Then, we retrained the model with additional embedding layer g(•) using the semi-supervised loss with paired si84 and unpaired si284 with Algorithm 1. As seen in [8]- [10], we observed that the retraining always results better than training from random weights. We searched the best hyperparameters α, β ∈ [0.5, 0.9] on the dev93 set.…”
Section: Settingssupporting
confidence: 62%
See 1 more Smart Citation
“…Then, we retrained the model with additional embedding layer g(•) using the semi-supervised loss with paired si84 and unpaired si284 with Algorithm 1. As seen in [8]- [10], we observed that the retraining always results better than training from random weights. We searched the best hyperparameters α, β ∈ [0.5, 0.9] on the dev93 set.…”
Section: Settingssupporting
confidence: 62%
“…According to [5], careful transcription costs 20 hours of human effort to create paired text for each hour of speech. To reduce the need for such hard effort, many researchers have developed semi-supervised training methods for ASR systems [6]- [10] because this way we can easily obtain a lot of unpaired data without such effort.…”
Section: Introductionmentioning
confidence: 99%
“…In our scenario we consider adding the 'CC-untran' data (untranscribed data from the target domain = contact centers), or the 'Parl' data (imperfectly transcribed parliament data from different domain). We train either with 'masking' (scaling gradients in NN training with 0/1 per-frame weights) [7], or with 're-segmentatation' (selecting sub-segments with reliable transcripts) [6]. We begin with constructing a seed system, which we use for decoding automatic transcripts and filtering the imperfect transcripts.…”
Section: Methodsmentioning
confidence: 99%
“…With the untranscribed data, the situation is more difficult. The data can be used in the semi-supervised training [7,3,8,9,10,11,12,13]. Here, we need to identify the most reliable parts of the automatically generated transcripts to be included into the training.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation