Interspeech 2017 2017
DOI: 10.21437/interspeech.2017-519
|View full text |Cite
|
Sign up to set email alerts
|

Large-Scale Domain Adaptation via Teacher-Student Learning

Abstract: High accuracy speech recognition requires a large amount of transcribed data for supervised training. In the absence of such data, domain adaptation of a well-trained acoustic model can be performed, but even here, high accuracy usually requires significant labeled data from the target domain. In this work, we propose an approach to domain adaptation that does not require transcriptions but instead uses a corpus of unlabeled parallel data, consisting of pairs of samples from the source domain of the well-train… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

2
82
0
1

Year Published

2018
2018
2024
2024

Publication Types

Select...
4
3
2

Relationship

1
8

Authors

Journals

citations
Cited by 118 publications
(86 citation statements)
references
References 25 publications
2
82
0
1
Order By: Relevance
“…ASR suffers from performance degradation when a well-trained acoustic model is applied in a new domain [19]. T/S learning [3,8,9] and adversarial learning [20,21,22,23,24] are two effective approaches that can suppress this domain mismatch by adapting a source-domain acoustic model to target-domain speech. T/S learning is more suited for the situation where unlabeled parallel data is available for adaptation, 2 in which a sequence of source-domain speech features is fed as the input to a source-domain teacher model and a parallel sequence of target-domain features is at the input to the target-domain student model to optimize the student model parameters by minimizing the T/S loss in Eq.…”
Section: Conditional T/s Learning For Domain Adaptationmentioning
confidence: 99%
“…ASR suffers from performance degradation when a well-trained acoustic model is applied in a new domain [19]. T/S learning [3,8,9] and adversarial learning [20,21,22,23,24] are two effective approaches that can suppress this domain mismatch by adapting a source-domain acoustic model to target-domain speech. T/S learning is more suited for the situation where unlabeled parallel data is available for adaptation, 2 in which a sequence of source-domain speech features is fed as the input to a source-domain teacher model and a parallel sequence of target-domain features is at the input to the target-domain student model to optimize the student model parameters by minimizing the T/S loss in Eq.…”
Section: Conditional T/s Learning For Domain Adaptationmentioning
confidence: 99%
“…T/S learning was at first explored in speech community [22] [23] to distill the knowledge from bigger models to a smaller one, and was successfully applied in the areas of ASR [24,25] and keyword spotting [26] afterwards. Instead of knowledge distillation, we adopt the T/S learning for domain adaptation which was proposed in [27] to build an ASR system performing more robustly under multimedia noise. On top of this system, we apply logits selection keeping only the k highest values and experiment it with multiple settings of temperature T .…”
Section: Introductionmentioning
confidence: 99%
“…Because the T/S learning technique applied in this work does not require transcribed data, we also explore how much the system performance can be further improved by gradually incorporating more training recordings. Finally, we study the effects of doing sequence training on top of the T/S learning for domain adaptation, which was not reported in [27].…”
Section: Introductionmentioning
confidence: 99%
“…Soft labels should be more informative than hard labels, providing the student the ability to learn more complex functions [13]. We use the techniques described in prior work on T/S training described in Section 1 [14,15,13]. Instead of using all 3183 senone probabilities, we only use the top 20 probabilities as described in [15].…”
Section: Teacher-student Trainingmentioning
confidence: 99%