Interspeech 2018 2018
DOI: 10.21437/interspeech.2018-1424
|View full text |Cite
|
Sign up to set email alerts
|

Encoder Transfer for Attention-based Acoustic-to-word Speech Recognition

Abstract: Acoustic-to-word speech recognition based on attention-based encoder-decoder models achieves better accuracies with much lower latency than the conventional speech recognition systems. However, acoustic-to-word models require a very large amount of training data and it is difficult to prepare one for a new domain such as elderly speech. To address the problem, we propose domain adaptation based on transfer learning with layer freezing. Layer freezing first pre-trains a network with the source domain data, and … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
6
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
5
1
1

Relationship

1
6

Authors

Journals

citations
Cited by 7 publications
(6 citation statements)
references
References 20 publications
0
6
0
Order By: Relevance
“…Most adaptation technologies discussed in this paper can also be applied to domain adaptation [154], [232]- [235]. When the amount of adaptation data is limited, a common practice is adapting only partial layers of the network [236]. To let the adapted model still perform well on the source domain, Moriya et al [237] proposed progressive neural networks by adding an additional model column to the original model for each new domain and only update the new model column with the new domain data.…”
Section: Domain Adaptationmentioning
confidence: 99%
“…Most adaptation technologies discussed in this paper can also be applied to domain adaptation [154], [232]- [235]. When the amount of adaptation data is limited, a common practice is adapting only partial layers of the network [236]. To let the adapted model still perform well on the source domain, Moriya et al [237] proposed progressive neural networks by adding an additional model column to the original model for each new domain and only update the new model column with the new domain data.…”
Section: Domain Adaptationmentioning
confidence: 99%
“…In this paper, we use the attention-based encoder-decoder model [23][24][25] for an endto-end ASR system. Our implementation of the model is based on [4,8], and summarized in this section.…”
Section: Attention-based Encoder-decoder Model For Automatic Speech Rmentioning
confidence: 99%
“…Recent automatic speech recognition (ASR) systems can map acoustic features to word sequences directly; called acoustic-toword (A2W) end-to-end ASR, the approach is based on a fully neural network (FNN) -based architecture [1][2][3][4][5][6][7][8]. Unfortunately, end-to-end ASR systems are not robust to out-of-vocabulary (OOV) words because the number of NN outputs, which correspond to word entries, is fixed.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…However, the pruning algorithm is applied for the entire network, and there has been no investigation into the subnetwork-wise parameter freezing. Some studies on domain adaptation of ASR models have shown that updating only a part of the layers improves the performance on the target domain [16,17]. However, there has been no research on the performance against catastrophic forgetting.…”
Section: Introductionmentioning
confidence: 99%