Beyond Sentence-Level End-to-End Speech Translation: Context Helps

Zhang, Biao; Titov, Ivan; Haddow, Barry; Sennrich, Rico

doi:10.18653/v1/2021.acl-long.200

Cited by 8 publications

(3 citation statements)

References 40 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We focus on improving translation quality of conversations by speaker-turn and cross-talk detection, yet using the context information could also help. In addition, within each MT-MS segment, the inter-utterance context could have already been leveraged (Zhang et al, 2021). We leave analysis of the interand intra-segment context as future work.…”

Section: Discussionmentioning

confidence: 99%

End-to-End Single-Channel Speaker-Turn Aware Conversational Speech Translation

Zuluaga-Gomez,

Huang,

Niu

et al. 2023

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

View full text Add to dashboard Cite

Conventional speech-to-text translation (ST) systems are trained on single-speaker utterances, and they may not generalize to real-life scenarios where the audio contains conversations by multiple speakers. In this paper, we tackle single-channel multi-speaker conversational ST with an end-to-end and multi-task training model, named Speaker-Turn Aware Conversational Speech Translation, that combines automatic speech recognition, speech translation and speaker turn detection using special tokens in a serialized labeling format. We run experiments on the Fisher-CALLHOME corpus, which we adapted by merging the two single-speaker channels into one multi-speaker channel, thus representing the more realistic and challenging scenario with multi-speaker turns and cross-talk. Experimental results across single-and multi-speaker conditions and against conventional ST systems, show that our model outperforms the reference systems on the multi-speaker condition, while attaining comparable performance on the single-speaker condition. We release scripts for data processing and model training. 1 * Work conducted during an internship at Amazon.

show abstract

Section: Discussionmentioning

confidence: 99%

End-to-End Single-Channel Speaker-Turn Aware Conversational Speech Translation

Zuluaga-Gomez,

Huang,

Niu

et al. 2023

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

View full text Add to dashboard Cite

show abstract

“…Context-aware ST models have been shown to be robust towards error-prone automatic segmentations of the test set at inference time (Zhang et al, 2021a). Our method bears similarities with Gaido et al (2020b); Papi et al (2021) in that it re-segments the train set to create synthetic data.…”

Section: Relevant Researchmentioning

confidence: 99%

SegAugment: Maximizing the Utility of Speech Translation Data with Segmentation-based Augmentations

Tsiamas,

Fonollosa,

Costa-jussà

2023

Findings of the Association for Computational Linguistics: EMNLP 2023

View full text Add to dashboard Cite

End-to-end Speech Translation is hindered by a lack of available data resources. While most of them are based on documents, a sentencelevel version is available, which is however single and static, potentially impeding the usefulness of the data. We propose a new data augmentation strategy, SEGAUGMENT, to address this issue by generating multiple alternative sentence-level versions of a dataset. Our method utilizes an Audio Segmentation system, which re-segments the speech of each document with different length constraints, after which we obtain the target text via alignment methods. Experiments demonstrate consistent gains across eight language pairs in MuST-C, with an average increase of 2.5 BLEU points, and up to 5 BLEU for low-resource scenarios in mTEDx. Furthermore, when combined with a strong system, SEGAUGMENT obtains stateof-the-art results in MuST-C. Finally, we show that the proposed method can also successfully augment sentence-level datasets, and that it enables Speech Translation models to close the gap between the manual and automatic segmentation at inference time.

show abstract

“…With regard to exploiting streaming history, or more generally sentence context, it is worth mentioning the significant amount of previous work in offline MT at sentence level (Tiedemann and Scherrer, 2017;Agrawal et al, 2018), document level (Scherrer et al, 2019;Ma et al, 2020a;Zheng et al, 2020b;Li et al, 2020;Maruf et al, 2021;Zhang et al, 2021), and in related areas such as language modelling (Dai et al, 2019) that has proved to lead to quality gains. Also, as reported in (Li et al, 2020), more robust ST systems can be trained by taking advantage of the context across sentence boundaries using a data augmentation strategy similar to the prefix training methods proposed in (Niehues et al, 2018;Ma et al, 2019).…”

Section: Introductionmentioning

confidence: 99%

From Simultaneous to Streaming Machine Translation by Leveraging Streaming History

Iranzo-Sánchez¹,

Civera²,

Juan³

2022

Preprint

View full text Add to dashboard Cite

Simultaneous Machine Translation is the task of incrementally translating an input sentence before it is fully available. Currently, simultaneous translation is carried out by translating each sentence independently of the previously translated text. More generally, Streaming MT can be understood as an extension of Simultaneous MT to the incremental translation of a continuous input text stream. In this work, a state-of-the-art simultaneous sentencelevel MT system is extended to the streaming setup by leveraging the streaming history. Extensive empirical results are reported on IWSLT Translation Tasks, showing that leveraging the streaming history leads to significant quality gains. In particular, the proposed system proves to compare favorably to the best performing systems.

show abstract

Beyond Sentence-Level End-to-End Speech Translation: Context Helps

Cited by 8 publications

References 40 publications

End-to-End Single-Channel Speaker-Turn Aware Conversational Speech Translation

End-to-End Single-Channel Speaker-Turn Aware Conversational Speech Translation

SegAugment: Maximizing the Utility of Speech Translation Data with Segmentation-based Augmentations

From Simultaneous to Streaming Machine Translation by Leveraging Streaming History

Contact Info

Product

Resources

About