2022
DOI: 10.48550/arxiv.2203.08757
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Sample, Translate, Recombine: Leveraging Audio Alignments for Data Augmentation in End-to-end Speech Translation

Abstract: End-to-end speech translation relies on data that pair source-language speech inputs with corresponding translations into a target language. Such data are notoriously scarce, making synthetic data augmentation by backtranslation or knowledge distillation a necessary ingredient of end-to-end training. In this paper, we present a novel approach to data augmentation that leverages audio alignments, linguistic properties, and translation. First, we augment a transcription by sampling from a suffix memory that stor… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 16 publications
0
1
0
Order By: Relevance
“…The lack of end-to-end training data is another obstacle to the end-to-end ST system. Recent work address this obstacle by leveraging the available data resource using multi-task learning [7,8,9], transfer learning [10,11] and generating synthetic data [12,13,14] techniques. This work extends this idea with a novel loss function to efficiently use the available data.…”
Section: Related Workmentioning
confidence: 99%
“…The lack of end-to-end training data is another obstacle to the end-to-end ST system. Recent work address this obstacle by leveraging the available data resource using multi-task learning [7,8,9], transfer learning [10,11] and generating synthetic data [12,13,14] techniques. This work extends this idea with a novel loss function to efficiently use the available data.…”
Section: Related Workmentioning
confidence: 99%