Prediction Improves Simultaneous Neural Machine Translation

Alinejad, Ashkan; Siahbani, Maryam; Sarkar, Anoop

doi:10.18653/v1/d18-1337

Cited by 73 publications

(50 citation statements)

References 11 publications

(16 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Other measures of lagging include Average proportion (AP) [8] and Differentiable Average Lagging (DAL) [18]. AP is unfavorable to short sequences and is incapable of highlighting improvement as it occupies a narrow range [1,18,29]. DAL is a differentiable version of AL used to regularize trainable decoders, and behaves similarly to AL.…”

Section: Training Wait-k Modelsmentioning

confidence: 99%

Efficient Wait-k Models for Simultaneous Machine Translation

Elbayad¹,

Besacier²,

Verbeek³

2020

Interspeech 2020

View full text Add to dashboard Cite

Simultaneous machine translation consists in starting output generation before the entire input sequence is available. Wait-k decoders offer a simple but efficient approach for this problem. They first read k source tokens, after which they alternate between producing a target token and reading another source token. We investigate the behavior of wait-k decoding in low resource settings for spoken corpora using IWSLT datasets. We improve training of these models using unidirectional encoders, and training across multiple values of k. Experiments with Transformer and 2D-convolutional architectures show that our wait-k models generalize well across a wide range of latency levels. We also show that the 2D-convolution architecture is competitive with Transformers for simultaneous translation of spoken language.

show abstract

Section: Training Wait-k Modelsmentioning

confidence: 99%

Efficient Wait-k Models for Simultaneous Machine Translation

Elbayad¹,

Besacier²,

Verbeek³

2020

Interspeech 2020

View full text Add to dashboard Cite

show abstract

“…Knowing that a very-low-latency wait-1 system incurs at best an AP of 0.5 also implies that much of the metric's dynamic range is wasted; in fact, Alinejad et al (2018) report that AP is not sufficiently sensitive to detect their improvements to simultaneous MT.…”

Section: Previous Latency Metricsmentioning

confidence: 99%

Monotonic Infinite Lookback Attention for Simultaneous Machine Translation

Arivazhagan¹,

Cherry²,

Macherey³

et al. 2019

Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

145

199

View full text Add to dashboard Cite

Simultaneous machine translation begins to translate each source sentence before the source speaker is finished speaking, with applications to live and streaming scenarios. Simultaneous systems must carefully schedule their reading of the source sentence to balance quality against latency. We present the first simultaneous translation system to learn an adaptive schedule jointly with a neural machine translation (NMT) model that attends over all source tokens read thus far. We do so by introducing Monotonic Infinite Lookback (MILk) attention, which maintains both a hard, monotonic attention head to schedule the reading of the source sentence, and a soft attention head that extends from the monotonic head back to the beginning of the source. We show that MILk's adaptive schedule allows it to arrive at latency-quality trade-offs that are favorable to those of a recently proposed wait-k strategy for many latency values.

show abstract

“…A separate policy model could avoid these issues. However, previous policy-learning methods either depends on reinforcement learning (RL) (Grissom II et al, 2014;Gu et al, 2017;Alinejad et al, 2018), which makes the training process unstable and inefficient due to exploration, or applies advanced attention mechanisms (Arivazhagan et al, 2019), which requires its training process to be autoregressive, and hence inefficient. Furthermore, each such learned policy cannot change its behaviour according to different latency requirements at testing time, and we will need to train multiple policy models for scenarios with different latency requirements.…”

Section: Introductionmentioning

confidence: 99%

Simpler and Faster Learning of Adaptive Policies for Simultaneous Translation

Zheng¹,

Zheng²,

Ma³

et al. 2019

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conferen

View full text Add to dashboard Cite

Simultaneous translation is widely useful but remains challenging. Previous work falls into two main categories: (a) fixed-latency policies such as Ma et al. (2019) and (b) adaptive policies such as Gu et al. (2017). The former are simple and effective, but have to aggressively predict future content due to diverging source-target word order; the latter do not anticipate, but suffer from unstable and inefficient training. To combine the merits of both approaches, we propose a simple supervisedlearning framework to learn an adaptive policy from oracle READ/WRITE sequences generated from parallel text. At each step, such an oracle sequence chooses to WRITE the next target word if the available source sentence context provides enough information to do so, otherwise READ the next source word. Experiments on German↔English show that our method, without retraining the underlying NMT model, can learn flexible policies with better BLEU scores and similar latencies compared to previous work.

show abstract

Prediction Improves Simultaneous Neural Machine Translation

Cited by 73 publications

References 11 publications

Efficient Wait-k Models for Simultaneous Machine Translation

Efficient Wait-k Models for Simultaneous Machine Translation

Monotonic Infinite Lookback Attention for Simultaneous Machine Translation

Simpler and Faster Learning of Adaptive Policies for Simultaneous Translation

Contact Info

Product

Resources

About