Graph Convolutional Encoders for Syntax-aware Neural Machine Translation

Bastings, Jasmijn; Titov, Ivan; Aziz, Wilker; Marcheggiani, Diego; Sima’an, Khalil

doi:10.18653/v1/d17-1209

Cited by 464 publications

(356 citation statements)

References 34 publications

(42 reference statements)

Supporting

Mentioning

336

Contrasting

Unclassified

Order By: Relevance

“…Graph Neural Networks in NLP. Recently, graph neural networks have been shown successful in the NLP community, such as modeling semantic graphs [Beck et al, 2018;Song et al, 2018a;Song et al, 2019], dependency trees Bastings et al, 2017;Song et al, 2018b], knowledge graphs and even sentences Xu et al, 2018]. Particularly, Zhang et al, [2018] proposed GRN to represent raw sentences by building a graph structure of neighboring words and a sentence-level node.…”

Section: Ablation Studymentioning

confidence: 99%

Graph-based Neural Sentence Ordering

Yin

Song

et al. 2019

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence

View full text Add to dashboard Cite

Sentence ordering is to restore the original paragraph from a set of sentences. It involves capturing global dependencies among sentences regardless of their input order. In this paper, we propose a novel and flexible graph-based neural sentence ordering model, which adopts graph recurrent network to accurately learn semantic representations of the sentences.Instead of assuming connections between all pairs of input sentences, we use entities that are shared among multiple sentences to make more expressive graph representations with less noise. Experimental results show that our proposed model outperforms the existing stateof-the-art systems on several benchmark datasets, demonstrating the effectiveness of our model. We also conduct a thorough analysis on how entities help the performance. Our code is available at

show abstract

Section: Ablation Studymentioning

confidence: 99%

Graph-based Neural Sentence Ordering

Yin

Song

et al. 2019

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence

View full text Add to dashboard Cite

show abstract

“…There are some research coming to explore the graph convolutional work that are more suitable for text classification. Firstly GCNs are used to capture the syntactic structure in [3], which produce representations of words and show the improvement. The method [13] mentioned in the last paragraph apply GCN to text classification, but it can't naturally support edge features.…”

Section: Dual-attenmentioning

confidence: 99%

“…However, these deep neural networks cannot well model the irregular structure of texts, which is crucial for text recognition task. Recently, graph convolutional networks (GCNs) [3,13] have been proposed with a lot of success in various tasks, and also applied in feature representation of texts. On the other hand, due to the difficulty in modeling data variance, the attention mechanism [18,2,23] is proposed and widely embedded in multiple models, achieving promising results on a variety of tasks.…”

Section: Introductionmentioning

confidence: 99%

Dual-Attention Graph Convolutional Network

Zhang

Zhao

et al. 2020

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Graph convolutional networks (GCNs) have shown the powerful ability in text structure representation and effectively facilitate the task of text classification. However, challenges still exist in adapting GCN on learning discriminative features from texts due to the main issue of graph variants incurred by the textual complexity and diversity. In this paper, we propose a dual-attention GCN to model the structural information of various texts as well as tackle the graph-invariant problem through embedding two types of attention mechanisms, i.e. the connection-attention and hop-attention, into the classic GCN. To encode various connection patterns between neighbour words, connectionattention adaptively imposes different weights specified to neighbourhoods of each word, which captures the short-term dependencies. On the other hand, the hop-attention applies scaled coefficients to different scopes during the graph diffusion process to make the model learn more about the distribution of context, which captures long-term semantics in an adaptive way. Extensive experiments are conducted on five widely used datasets to evaluate our dual-attention GCN, and the achieved state-of-the-art performance verifies the effectiveness of dual-attention mechanisms.

show abstract

“…Similar to HiSAN, Hashimoto and Tsuruoka (2017) use dependency features as attention distributions, but different from HiSAN, they use pre-trained dependency relations, and do not take into account the chains of dependencies. ; Bastings et al (2017) consider higherorder dependency relationships in Seq2Seq by incorporating a graph convolution technique (Kipf and Welling, 2016) into the encoder. However, the dependency information of the graph convolution technique is still given in pipeline manner.…”

Section: Related Workmentioning

confidence: 99%

Higher-Order Syntactic Attention Network for Longer Sentence Compression

Kamigaito¹,

Hayashi²,

Hirao³

et al. 2018

Proceedings of the 2018 Conference of the North American Chapter Of the Association for Computational Linguistics: Hu

View full text Add to dashboard Cite

Sentence compression methods based on LSTM can generate fluent compressed sentences. However, the performance of these methods is significantly degraded when compressing long sentences since it does not explicitly handle syntactic features. To solve this problem, we propose a higher-order syntactic attention network (HiSAN) that can handle higher-order dependency features as an attention distribution on LSTM hidden states. Furthermore, to avoid the influence of incorrect parse results, we train HiSAN by maximizing the probability of a correct output together with the attention distribution. Experiments on the Google sentence compression dataset show that our method achieved the best performance in terms of F 1 as well as ROUGE-1,2 and L scores, 83.2, 82.9, 75.8 and 82.7, respectively. In subjective evaluations, HiSAN outperformed baseline methods in both readability and informativeness.

show abstract

Graph Convolutional Encoders for Syntax-aware Neural Machine Translation

Cited by 464 publications

References 34 publications

Graph-based Neural Sentence Ordering

Graph-based Neural Sentence Ordering

Dual-Attention Graph Convolutional Network

Higher-Order Syntactic Attention Network for Longer Sentence Compression

Contact Info

Product

Resources

About