Over-Generation Cannot Be Rewarded: Length-Adaptive Average Lagging for Simultaneous Speech Translation

Papi, Sara; Gaido, Marco; Negri, Matteo; Turchi, Marco

doi:10.18653/v1/2022.autosimtrans-1.2

Cited by 8 publications

(6 citation statements)

References 24 publications

(42 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…• Average Lagging (AL; Ma et al, 2019 • Length Adaptive Average Lagging (LAAL; Polák et al, 2022;Papi et al, 2022) • Average Token Delay (ATD; • Average Proportion (AP; Cho and Esipova, 2016) • Differentiable Average Lagging (DAL; Cherry and Foster, 2019) We also measured the computation aware version of the latency metrics, as described by . However, due to the new synchronized SimulEval agent pipeline design, the actual computation aware latency can be smaller with carefully designed parallelism.…”

Section: Discussionmentioning

confidence: 99%

Findings of the Iwslt 2023 Evaluation Campaign

Agarwal¹,

Agrawal²,

Anastasopoulos³

et al. 2023

Proceedings of the 20th International Conference on Spoken Language Translation (IWSLT 2023)

View full text Add to dashboard Cite

This paper reports on the shared tasks organized by the 20th IWSLT Conference. The shared tasks address 9 scientific challenges in spoken language translation: simultaneous and offline translation, automatic subtitling and dubbing, speech-to-speech translation, multilingual, dialect and low-resource speech translation, and formality control. The shared tasks attracted a total of 38 submissions by 31 teams. The growing interest towards spoken language translation is also witnessed by the constantly increasing number of shared task organizers and contributors to the overview paper, almost evenly distributed across industry and academia.

show abstract

Section: Discussionmentioning

confidence: 99%

Findings of the Iwslt 2023 Evaluation Campaign

Agarwal¹,

Agrawal²,

Anastasopoulos³

et al. 2023

Proceedings of the 20th International Conference on Spoken Language Translation (IWSLT 2023)

View full text Add to dashboard Cite

show abstract

“…All models are evaluated using Simuleval [19] toolkit. For the translation quality, we report detokenized case-sensitive BLEU [37], and for the latency, we report length-aware average lagging (LAAL) [7,38]. In all our experiments, we use beam search with size 6.…”

Section: Modelsmentioning

confidence: 99%

Incremental Blockwise Beam Search for Simultaneous Speech Translation with Controllable Quality-Latency Tradeoff

Polák¹,

Yan²,

Watanabe³

et al. 2023

Interspeech 2023

View full text Add to dashboard Cite

Blockwise self-attentional encoder models have recently emerged as one promising end-to-end approach to simultaneous speech translation. These models employ a blockwise beam search with hypothesis reliability scoring to determine when to wait for more input speech before translating further. However, this method maintains multiple hypotheses until the entire speech input is consumed -this scheme cannot directly show a single incremental translation to users. Further, this method lacks mechanisms for controlling the quality vs. latency tradeoff. We propose a modified incremental blockwise beam search incorporating local agreement or hold-n policies for qualitylatency control. We apply our framework to models trained for online or offline translation and demonstrate that both types can be effectively used in online mode.Experimental results on MuST-C show 0.6-3.6 BLEU improvement without changing latency or 0.8-1.4 s latency improvement without changing quality.

show abstract

“…This not only affects the resulting quality but also negatively impacts the AL latency evaluation reliability. Therefore, we proposed an improved version of the AL metric, which was later independently proposed under name length-adaptive average lagging (LAAL; Papi et al, 2022). To remedy the over-generation problem, we proposed an improved version of the beam search algorithm in Polák et al (2023b).…”

Section: Quality-latency Tradeoff In Sstmentioning

confidence: 99%

Long-form Simultaneous Speech Translation: Thesis Proposal

Polák

2023

Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacifi

View full text Add to dashboard Cite

Simultaneous speech translation (SST) aims to provide real-time translation of spoken language, even before the speaker finishes their sentence. Traditionally, SST has been addressed primarily by cascaded systems that decompose the task into subtasks, including speech recognition, segmentation, and machine translation. However, the advent of deep learning has sparked significant interest in end-toend (E2E) systems. Nevertheless, a major limitation of most approaches to E2E SST reported in the current literature is that they assume that the source speech is pre-segmented into sentences, which is a significant obstacle for practical, real-world applications. This thesis proposal addresses end-to-end simultaneous speech translation, particularly in the long-form setting, i.e., without pre-segmentation. We present a survey of the latest advancements in E2E SST, assess the primary obstacles in SST and its relevance to long-form scenarios, and suggest approaches to tackle these challenges. * The literature on simultaneous speech translation often uses the word "streaming" as an equivalent of "simultaneous" to refer to the translation of an unfinished utterance. In other literature, however, the term "streaming" refers to input spanning several sentences. To avoid confusion, we use "simultaneous" to refer to the translation of an unfinished utterance and "long-form" to refer to input spanning several sentences. 1 We consider only the speech-to-text variant in this work.

show abstract

Over-Generation Cannot Be Rewarded: Length-Adaptive Average Lagging for Simultaneous Speech Translation

Cited by 8 publications

References 24 publications

Findings of the Iwslt 2023 Evaluation Campaign

Findings of the Iwslt 2023 Evaluation Campaign

Incremental Blockwise Beam Search for Simultaneous Speech Translation with Controllable Quality-Latency Tradeoff

Long-form Simultaneous Speech Translation: Thesis Proposal

Contact Info

Product

Resources

About