Proceedings of the 2018 Conference of the North American Chapter Of the Association for Computational Linguistics: Hu 2018
DOI: 10.18653/v1/n18-2078
|View full text |Cite
|
Sign up to set email alerts
|

Exploiting Semantics in Neural Machine Translation with Graph Convolutional Networks

Abstract: Semantic representations have long been argued as potentially useful for enforcing meaning preservation and improving generalization performance of machine translation methods. In this work, we are the first to incorporate information about predicate-argument structure of source sentences (namely, semantic-role representations) into neural machine translation. We use Graph Convolutional Networks (GCNs) to inject a semantic bias into sentence encoders and achieve improvements in BLEU scores over the linguistic-… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
92
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
4
3
2

Relationship

1
8

Authors

Journals

citations
Cited by 190 publications
(95 citation statements)
references
References 19 publications
(30 reference statements)
1
92
0
Order By: Relevance
“…Using Full as the training data, the scores become 23.3, 23.9, 24.5 and 24.9, respectively. In addition to the different semantic representation being used (AMR vs SRL), Marcheggiani et al (2018) laid graph convolutional network (GCN) (Kipf and Welling, 2017) layers on top of a bidirectional LSTM (BiLSTM) layer, and then concatenated layer outputs as the attention memory. GCN layers encode the semantic role information, while BiLSTM layers encode the input sentence in the source language, and the concatenated hidden states of both layers contain information from both semantic role and source sentence.…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…Using Full as the training data, the scores become 23.3, 23.9, 24.5 and 24.9, respectively. In addition to the different semantic representation being used (AMR vs SRL), Marcheggiani et al (2018) laid graph convolutional network (GCN) (Kipf and Welling, 2017) layers on top of a bidirectional LSTM (BiLSTM) layer, and then concatenated layer outputs as the attention memory. GCN layers encode the semantic role information, while BiLSTM layers encode the input sentence in the source language, and the concatenated hidden states of both layers contain information from both semantic role and source sentence.…”
Section: Resultsmentioning
confidence: 99%
“…On the other hand, exploring semantics for NMT has so far received relatively little attention. Recently, Marcheggiani et al (2018) exploited semantic role labeling (SRL) for NMT, showing that the predicate-argument information from SRL can improve the performance of an attentionbased sequence-to-sequence model by alleviating the "argument switching" problem, 1 one frequent 1 flipping arguments corresponding to different roles and severe issue faced by NMT systems (Isabelle et al, 2017). Figure 1 (a) shows one example of semantic role information, which only captures the relations between a predicate (gave) and its arguments (John, wife and present).…”
Section: Introductionmentioning
confidence: 99%
“…Bastings et al [153] apply the Syntactic GCN to the task of neural machine translation. Marcheggiani et al [154] further adopt the same model as Bastings et al [153] to handle the semantic dependency graph of a sentence.…”
Section: Practical Applicationsmentioning
confidence: 99%
“…GNN for NLP: Recently, there is considerable amount of interest in applying GNN to NLP tasks and great success has been achieved. For example, in neural machine translation, GNN has been employed to integrate syntactic and semantic information into encoders (Bastings et al, 2017;Marcheggiani et al, 2018); applied GNN to relation extraction over pruned dependency trees; the study by Yao et al (2018) employed GNN over a heterogeneous graph to do text classification, which inspires our idea of the HDE graph; Liu et al (2018) proposed a new contextualized neural network for sequence learning by leveraging various types of non-local contextual information in the form of information passing over GNN. These studies are related to our work in the sense that we both use GNN to improve the information interaction over long context or across documents.…”
Section: Related Workmentioning
confidence: 99%