2022
DOI: 10.48550/arxiv.2205.01546
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Learn To Remember: Transformer with Recurrent Memory for Document-Level Machine Translation

Abstract: The Transformer architecture has led to significant gains in machine translation. However, most studies focus on only sentence-level translation without considering the context dependency within documents, leading to the inadequacy of document-level coherence. Some recent research tried to mitigate this issue by introducing an additional context encoder or translating with multiple sentences or even the entire document. Such methods may lose the information on the target side or have an increasing computationa… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4

Citation Types

1
3
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(4 citation statements)
references
References 23 publications
1
3
0
Order By: Relevance
“…In this paper, we focus on the speed-up of context-aware NMT when the global context is involved. Note that Wang et al [37] and Feng et al [51] both use recurrent networks to capture contextual information, which is similar to our work. Compared with Wang et al [37]'s work, we treat the document translation as a continuous process.…”
Section: Related Worksupporting
confidence: 84%
See 3 more Smart Citations
“…In this paper, we focus on the speed-up of context-aware NMT when the global context is involved. Note that Wang et al [37] and Feng et al [51] both use recurrent networks to capture contextual information, which is similar to our work. Compared with Wang et al [37]'s work, we treat the document translation as a continuous process.…”
Section: Related Worksupporting
confidence: 84%
“…The memory unit stored contextual information with multiple vectors and it was updated by the extra attention modules of each layer in both the encoder and decoder. Compared with Feng et al [51]'s work, our work is more efficient because of the simplicity of the architecture. In our work, no additional modules are required other than the RNN units.…”
Section: Related Workmentioning
confidence: 87%
See 2 more Smart Citations