Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Confer 2021
DOI: 10.18653/v1/2021.acl-long.267
|View full text |Cite
|
Sign up to set email alerts
|

G-Transformer for Document-Level Machine Translation

Abstract: Document-level MT models are still far from satisfactory. Existing work extend translation unit from single sentence to multiple sentences. However, study shows that when we further enlarge the translation unit to a whole document, supervised training of Transformer can fail. In this paper, we find such failure is not caused by overfitting, but by sticking around local minima during training. Our analysis shows that the increased complexity of target-to-source attention is a reason for the failure. As a soluti… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
9
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
5
2
2

Relationship

0
9

Authors

Journals

citations
Cited by 25 publications
(43 citation statements)
references
References 34 publications
(35 reference statements)
0
9
0
Order By: Relevance
“…The ability to process long sequences is critical for many Natural Language Processing tasks, including Document Summarization (Xiao and Carenini, 2019;Huang et al, 2021), Question Answering (Wang et al, 2020a), Information Extraction Du and Cardie, 2020;Ebner et al, 2020;Du et al, 2022), and Machine Translation (Bao et al, 2021). However, the quadratic computational cost of self-attention in transformer-based models limits their application in long-sequence tasks.…”
Section: Related Workmentioning
confidence: 99%
“…The ability to process long sequences is critical for many Natural Language Processing tasks, including Document Summarization (Xiao and Carenini, 2019;Huang et al, 2021), Question Answering (Wang et al, 2020a), Information Extraction Du and Cardie, 2020;Ebner et al, 2020;Du et al, 2022), and Machine Translation (Bao et al, 2021). However, the quadratic computational cost of self-attention in transformer-based models limits their application in long-sequence tasks.…”
Section: Related Workmentioning
confidence: 99%
“…The ability to process long sequences is critical for many Natural Language Processing tasks, including Document Summarization (Xiao and Carenini 2019;Huang et al 2021), Question Answering (Wang et al 2020b), Information Extraction (Li, Ji, and Han 2021;Du and Cardie 2020;Ebner et al 2020), and Machine Translation (Bao et al 2021). However, the quadratic computational cost of selfattention in transformer-based models limits their application in long-sequence tasks.…”
Section: Related Workmentioning
confidence: 99%
“…Despite its simplicity, the concatenation approach has been shown to achieve competitive or superior performance to more sophisticated, multiencoding systems (Lopes et al, 2020;Lupo et al, 2022a). However, learning with long concatenation sequences has been proven challenging for the Transformer architecture, because the self-attention can be "distracted" by long context Bao et al, 2021).…”
Section: Introductionmentioning
confidence: 99%