Learn To Remember: Transformer with Recurrent Memory for Document-Level Machine Translation

Feng, Yanhui; Li, Feng; Song, Ziang; Zheng, Boyuan; Koehn, Philipp

doi:10.48550/arxiv.2205.01546

Cited by 1 publication

(4 citation statements)

References 23 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In this paper, we focus on the speed-up of context-aware NMT when the global context is involved. Note that Wang et al [37] and Feng et al [51] both use recurrent networks to capture contextual information, which is similar to our work. Compared with Wang et al [37]'s work, we treat the document translation as a continuous process.…”

Section: Related Worksupporting

confidence: 84%

“…The memory unit stored contextual information with multiple vectors and it was updated by the extra attention modules of each layer in both the encoder and decoder. Compared with Feng et al [51]'s work, our work is more efficient because of the simplicity of the architecture. In our work, no additional modules are required other than the RNN units.…”

Section: Related Workmentioning

confidence: 87%

“…Some researchers also considered introducing graph neural networks into context-aware NMT [48], [49]. More recently, researchers took a step towards extracting the global contextual information from the whole document [16], [50], [51].…”

Section: Related Workmentioning

confidence: 99%

“…So we do not need to re-compute the contextual representation when translating a new sentence in the same document. Feng et al [51] introduced a contextual memory unit and two attention modules to manipulate the memory. The memory unit stored contextual information with multiple vectors and it was updated by the extra attention modules of each layer in both the encoder and decoder.…”

Section: Related Workmentioning

confidence: 99%

See 3 more Smart Citations

Document-Level Neural Machine Translation With Recurrent Context States

Zhao

Liu

2023

IEEE Access

View full text Add to dashboard Cite

Integrating contextual information into sentence-level neural machine translation (NMT) systems has been proven to be effective in generating fluent and coherent translations. However, taking too much context into account slows down these systems, especially when context-aware models are applied to the decoder side. To improve efficiency, we propose a simple and fast method to encode all sentences in an arbitrary large context window. It makes contextual representations in the process of translating each sentence so that the overhead introduced by the context model is almost negligible. We experiment with our method on three widely used English-German document-level translation datasets, which obtain substantial improvements over the sentence-level baseline with almost no loss in efficiency. Moreover, our method also achieves comparable performance with previous strong context-aware baselines and speeds up the inference by 1.53×. The speed-up is even larger when more contexts are taken into account. On the ContraPro pronoun translation dataset, it significantly outperforms the strong baseline.INDEX TERMS Neural machine translation, document-level translation, speeding up.

show abstract

Section: Related Worksupporting

confidence: 84%

Section: Related Workmentioning

confidence: 87%