Learn To Remember: Transformer with Recurrent Memory for Document-Level Machine Translation

Feng, Yanhui; Li, Feng; Song, Ziang; Zheng, Boyuan; Koehn, Philipp

doi:10.18653/v1/2022.findings-naacl.105

Cited by 6 publications

(5 citation statements)

References 1 publication

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Following previous work (Bao et al, 2021;Sun et al, 2022;Feng et al, 2022), we apply sentence-level BLEU score (s-BLEU) and document-level BLEU score (d-BLEU) as the metrics of evaluation. Since our methods are focused on the DocMT and do not involve sentence alignments, the d-BLEU score is our major metric, which matches n-grams in the whole document.…”

Section: Datasets and Settingsmentioning

confidence: 99%

“…Among these methods, the dominant approaches still adhere to the sentence-by-sentence mode, but they utilize additional contextual information, including the surrounding sentences Miculicich et al, 2018;Kang et al, 2020;Zhang et al, 2020bZhang et al, , 2021a, document contextual representation (Jiang et al, 2020;Ma et al, 2020) and memory units (Feng et al, 2022). In recent years, many researches have turned to translating multiple sentences or the entire document at once (Tan et al, 2019;Bao et al, 2021;Sun et al, 2022;.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Exploring Discourse Structure in Document-level Machine Translation

Hu,

Wan

2023

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

View full text Add to dashboard Cite

Neural machine translation has achieved great success in the past few years with the help of transformer architectures and large-scale bilingual corpora. However, when the source text gradually grows into an entire document, the performance of current methods for documentlevel machine translation (DocMT) is less satisfactory. Although the context is beneficial to the translation in general, it is difficult for traditional methods to utilize such long-range information. Previous studies on DocMT have concentrated on extra contents such as multiple surrounding sentences and input instances divided by a fixed length. We suppose that they ignore the structure inside the source text, which leads to under-utilization of the context. In this paper, we present a more sound paragraph-to-paragraph translation mode and explore whether discourse structure can improve DocMT. We introduce several methods from different perspectives, among which our RST-Att model with a multi-granularity attention mechanism based on the RST parsing tree works best. The experiments show that our method indeed utilizes discourse information and performs better than previous work.

show abstract

Section: Datasets and Settingsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Exploring Discourse Structure in Document-level Machine Translation

Hu,

Wan

2023

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

View full text Add to dashboard Cite

show abstract

“…Dai et al (2019) introduced a recurrence mechanism and improved positional encoding scheme in the Transformer. Later work proposed an explicit compressed memory realized by a few dense vectors (Feng et al, 2022).…”

Section: Long-form Mtmentioning

confidence: 99%

“…As pointed out in Sections 3.2 to 3.4, the performance usually drops with a context longer than a few sentences. Some solutions have been suggested (Kim et al, 2019;Feng et al, 2022), but it remains unclear how to adapt these approaches for SST with the specifics of SST in mind (e.g., computational constraints, speech input).…”

Section: Towards the Long-form Sst Viamentioning

confidence: 99%

CUNI-KIT System for Simultaneous Speech Translation Task at IWSLT 2022

Polák¹,

Pham²,

Nguyen³

et al. 2022

Proceedings of the 19th International Conference on Spoken Language Translation (IWSLT 2022)

View full text Add to dashboard Cite

In this paper, we describe our submission to the Simultaneous Speech Translation at IWSLT 2022. We explore strategies to utilize an offline model in a simultaneous setting without the need to modify the original model. In our experiments, we show that our onlinization algorithm is almost on par with the offline setting while being 3× faster than offline in terms of latency on the test set. We also show that the onlinized offline model outperforms the best IWSLT2021 simultaneous system in medium and high latency regimes and is almost on par in the low latency regime. We make our system publicly available. 1

show abstract

“…During the last decade, neural machine translation (NMT) has made remarkable progress to become a stateof-the-art method, especially for sentence-level translation [1,2]. In document-level translation, it is widely accepted that the introduction of discourse dependencies between sentences can improve the coherence and quality of the DOI reference number: 10.18293/SEKE2023-165 translated text [3,4]. Like those for sentence-level NMT, most existing document-level NMT (DocNMT) models integrate contextual information using an attention mechanism.…”

Section: Introductionmentioning

confidence: 99%

A Detection-based Attention Alignment Method for Document-level Neural Machine Translation

Zhong,

Guo,

et al. 2023

International Conferences on Software Engineering and Knowledge Engineering

View full text Add to dashboard Cite

Previous works have shown that inter-sentential contextual information can lead to substantial improvements in document-level neural machine translation (DocNMT). Most existing DocNMT models focus on methods of introducing inter-sentential contextual information through attention mechanisms. Compared to intra-sentential attention, however, the long-range dependency in documentlevel attention calculation inevitably introduces meaningless contextual noise, resulting in significant performance deterioration. To address this problem, this paper proposes a detection-based attention alignment method, to help each translating word focus on relevant informative contextual words. We first introduce a context detector that automatically evaluates each source-side word's effect on the model's prediction. Based on the detection results, we align the original attention weights by integrating the cosine similarity between the aligned and original attention weights into the loss function, under a multi-task framework, which allows DocNMT to more effectively capture the document-level context. The results for three English-German (En-De) public translation datasets show that the proposed method can obtain consistent improvements over a strong G-Transformer baseline.

show abstract

Learn To Remember: Transformer with Recurrent Memory for Document-Level Machine Translation

Cited by 6 publications

References 1 publication

Exploring Discourse Structure in Document-level Machine Translation

Exploring Discourse Structure in Document-level Machine Translation

CUNI-KIT System for Simultaneous Speech Translation Task at IWSLT 2022

A Detection-based Attention Alignment Method for Document-level Neural Machine Translation

Contact Info

Product

Resources

About