Proceedings of the 2019 Conference of the North 2019
DOI: 10.18653/v1/n19-1006
|View full text |Cite
|
Sign up to set email alerts
|

Pre-training on high-resource speech recognition improves low-resource speech-to-text translation

Abstract: We present a simple approach to improve direct speech-to-text translation (ST) when the source language is low-resource: we pre-train the model on a high-resource automatic speech recognition (ASR) task, and then fine-tune its parameters for ST. We demonstrate that our approach is effective by pre-training on 300 hours of English ASR data to improve Spanish-English ST from 10.8 to 20.2 BLEU when only 20 hours of Spanish-English ST training data are available. Through an ablation study, we find that the pre-tra… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
148
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
3
2
2

Relationship

0
7

Authors

Journals

citations
Cited by 131 publications
(148 citation statements)
references
References 34 publications
(30 reference statements)
0
148
0
Order By: Relevance
“…The common way is to use an ASR encoder and an MT decoder to initialize the parameters of the ST network correspondingly [20]. Surprisingly, using an ASR model to pre-train both the encoder and the decoder of the ST model works well [19]. [30], we automatically recompute the provided audio-to-source-sentence alignments to reduce the problem of speech segments without a translation.…”
Section: Pre-trainingmentioning
confidence: 99%
See 3 more Smart Citations
“…The common way is to use an ASR encoder and an MT decoder to initialize the parameters of the ST network correspondingly [20]. Surprisingly, using an ASR model to pre-train both the encoder and the decoder of the ST model works well [19]. [30], we automatically recompute the provided audio-to-source-sentence alignments to reduce the problem of speech segments without a translation.…”
Section: Pre-trainingmentioning
confidence: 99%
“…This huge degradation has led to further investigations where we study why pre-training of text decoder using an MT model hurts. To explain this justification, we first try pre-training both the encoder and the decoder using our ASR model as suggested in [19]. Since the ASR decoder is already familiar with the ASR encoder, this problem should be disappeared.…”
Section: Pre-trainingmentioning
confidence: 99%
See 2 more Smart Citations
“…One of the most recent and successful data augmentation methods, SpecAugment [3], modifies the spectrogram with time warping, frequency masking and time masking. AST methods to leverage ASR and MT data include pretraining [4], multitask learning [5] and weakly supervised data augmentation [6,7].…”
Section: Introductionmentioning
confidence: 99%