Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) 2022
DOI: 10.18653/v1/2022.acl-short.27
|View full text |Cite
|
Sign up to set email alerts
|

Sample, Translate, Recombine: Leveraging Audio Alignments for Data Augmentation in End-to-end Speech Translation

Abstract: End-to-end speech translation relies on data that pair source-language speech inputs with corresponding translations into a target language. Such data are notoriously scarce, making synthetic data augmentation by backtranslation or knowledge distillation a necessary ingredient of end-to-end training. In this paper, we present a novel approach to data augmentation that leverages audio alignments, linguistic properties, and translation. First, we augment a transcription by sampling from a suffix memory that stor… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
1
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 8 publications
(4 citation statements)
references
References 1 publication
0
1
0
Order By: Relevance
“…Within NLP, the data augmentation technique has gained substantial traction to expand the pool of available training instances. This approach finds widespread application across diverse domains, including text classification (Wu et al 2022;Liu et al 2022;Ouyang et al 2022), neural machine translation (Lam, Schamoni, and Riezler 2022;Kambhatla, Born, and Sarkar 2022;Gao et al 2019), and text generation (Bi, Li, and Huang 2021;Xu et al 2021). Notably, recent strides in ABSA have similarly leveraged data augmentation (Chen, Faulkner, and Badyal 2022;Wang et al 2022a;Hsu et al 2021).…”
Section: Data Augmentationmentioning
confidence: 99%
“…Within NLP, the data augmentation technique has gained substantial traction to expand the pool of available training instances. This approach finds widespread application across diverse domains, including text classification (Wu et al 2022;Liu et al 2022;Ouyang et al 2022), neural machine translation (Lam, Schamoni, and Riezler 2022;Kambhatla, Born, and Sarkar 2022;Gao et al 2019), and text generation (Bi, Li, and Huang 2021;Xu et al 2021). Notably, recent strides in ABSA have similarly leveraged data augmentation (Chen, Faulkner, and Badyal 2022;Wang et al 2022a;Hsu et al 2021).…”
Section: Data Augmentationmentioning
confidence: 99%
“…Our approach, however, segments documents at arbitrary points, thus providing access to a greater number of synthetic examples. An alternative approach by Lam et al (2022b) involves recombining training data in a linguisticallymotivated way, by sampling pivot tokens, retrieving possible continuations from a suffix memory, combining them to obtain new speech-transcription pairs, and finally using an MT model to generate the translations. Our method is similar since it also leverages audio alignments and MT, but instead of mixing speech, it segments at alternative points.…”
Section: Relevant Researchmentioning
confidence: 99%
“…Due to the relatively short duration of the examples, we only apply SEGAUGMENT with short and medium configurations. In Table 5 we provide our results for En-De, with and without SEGAUGMENT, a bilingual baseline from , and the recently proposed Sample-Translate-Recombine (STR) augmentation method (Lam et al, 2022b), which uses the same model architecture as our experiments. Although designed for document-level data, SEGAUGMENT brings significant improvements to the baseline 8 , even outperforming the STR augmentation method by 0.5 BLEU points.…”
Section: Application On Sentence-level Datamentioning
confidence: 99%
“…However, due to the inherent complexity and variation of speech signals and the scarcity of high-quality E2E ST data, achieving satisfactory performance remains challenging. Over the years, a variety of approaches have been proposed to address these issues, such as pre-training (Wang et al, 2020b;Tang et al, 2021b;Dong et al, 2021), multi-task learning (Vydana et al, 2021;Ye et al, 2021;Tang et al, 2022), data augmentation (Jia et al, 2019;Lam et al, 2022), contrastive learning and knowledge distillation (Tang et al, 2021a;.…”
Section: Related Workmentioning
confidence: 99%