G-Transformer for Document-Level Machine Translation

Bao, Guangsheng; Zhang, Yue; Teng, Zhiyang; Chen, Boxing; Luo, Weihua

doi:10.18653/v1/2021.acl-long.267

Cited by 25 publications

(43 citation statements)

References 34 publications

(35 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The ability to process long sequences is critical for many Natural Language Processing tasks, including Document Summarization (Xiao and Carenini, 2019;Huang et al, 2021), Question Answering (Wang et al, 2020a), Information Extraction Du and Cardie, 2020;Ebner et al, 2020;Du et al, 2022), and Machine Translation (Bao et al, 2021). However, the quadratic computational cost of self-attention in transformer-based models limits their application in long-sequence tasks.…”

Section: Related Workmentioning

confidence: 99%

Sketching as a Tool for Understanding and Accelerating Self-attention for Long Sequences

Chen¹,

Zeng²,

Hakkani‐Tür³

et al. 2022

Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Langua

View full text Add to dashboard Cite

Transformer-based models are not efficient in processing long sequences due to the quadratic space and time complexity of the self-attention modules. To address this limitation, Linformer and Informer reduce the quadratic complexity to linear (modulo logarithmic factors) via low-dimensional projection and row selection, respectively. These two models are intrinsically connected, and to understand their connection we introduce a theoretical framework of matrix sketching. Based on the theoretical analysis, we propose Skeinformer to accelerate selfattention and further improve the accuracy of matrix approximation to self-attention with column sampling, adaptive row normalization and pilot sampling reutilization. Experiments on the Long Range Arena benchmark demonstrate that our methods outperform alternatives with a consistently smaller time/space footprint 1 .

show abstract

Section: Related Workmentioning

confidence: 99%

Sketching as a Tool for Understanding and Accelerating Self-attention for Long Sequences

Chen¹,

Zeng²,

Hakkani‐Tür³

et al. 2022

Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Langua

View full text Add to dashboard Cite

show abstract

“…The ability to process long sequences is critical for many Natural Language Processing tasks, including Document Summarization (Xiao and Carenini 2019;Huang et al 2021), Question Answering (Wang et al 2020b), Information Extraction (Li, Ji, and Han 2021;Du and Cardie 2020;Ebner et al 2020), and Machine Translation (Bao et al 2021). However, the quadratic computational cost of selfattention in transformer-based models limits their application in long-sequence tasks.…”

Section: Related Workmentioning

confidence: 99%

Sketching as a Tool for Understanding and Accelerating Self-attention for Long Sequences

Chen¹,

Zeng²,

Hakkani‐Tür³

et al. 2021

Preprint

View full text Add to dashboard Cite

Transformer-based models are not efficient in processing long sequences due to the quadratic space and time complexity of the self-attention modules. To address this limitation, Linformer and Informer are proposed to reduce the quadratic complexity to linear (modulo logarithmic factors) via lowdimensional projection and row selection respectively. These two models are intrinsically connected, and to understand their connection, we introduce a theoretical framework of matrix sketching. Based on the theoretical analysis, we propose Skeinformer to accelerate self-attention and further improve the accuracy of matrix approximation to self-attention with three carefully designed components: column sampling, adaptive row normalization and pilot sampling reutilization. Experiments on the Long Range Arena (LRA) benchmark demonstrate that our methods outperform alternatives with a consistently smaller time/space footprint. 1 * Equal contribution.

show abstract

“…Despite its simplicity, the concatenation approach has been shown to achieve competitive or superior performance to more sophisticated, multiencoding systems (Lopes et al, 2020;Lupo et al, 2022a). However, learning with long concatenation sequences has been proven challenging for the Transformer architecture, because the self-attention can be "distracted" by long context Bao et al, 2021).…”

Section: Introductionmentioning

confidence: 99%

Encoding Sentence Position in Context-Aware Neural Machine Translation with Concatenation

Lupo,

Dinarelli,

Besacier

2023

The Fourth Workshop on Insights From Negative Results in NLP

View full text Add to dashboard Cite

Context-aware translation can be achieved by processing a concatenation of consecutive sentences with the standard Transformer architecture. This paper investigates the intuitive idea of providing the model with explicit information about the position of the sentences contained in the concatenation window. We compare various methods to encode sentence positions into token representations, including novel methods. Our results show that the Transformer benefits from certain sentence position encodings methods on En→Ru, if trained with a context-discounted loss (Lupo et al., 2022b). However, the same benefits are not observed on En→De. Further empirical efforts are necessary to define the conditions under which the proposed approach is beneficial.

show abstract

G-Transformer for Document-Level Machine Translation

Cited by 25 publications

References 34 publications

Sketching as a Tool for Understanding and Accelerating Self-attention for Long Sequences

Sketching as a Tool for Understanding and Accelerating Self-attention for Long Sequences

Sketching as a Tool for Understanding and Accelerating Self-attention for Long Sequences

Encoding Sentence Position in Context-Aware Neural Machine Translation with Concatenation

Contact Info

Product

Resources

About