Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing 2017
DOI: 10.18653/v1/d17-1012
|View full text |Cite
|
Sign up to set email alerts
|

Neural Machine Translation with Source-Side Latent Graph Parsing

Abstract: This paper presents a novel neural machine translation model which jointly learns translation and source-side latent graph representations of sentences. Unlike existing pipelined approaches using syntactic parsers, our end-to-end model learns a latent graph parser as part of the encoder of an attention-based neural machine translation model, and thus the parser is optimized according to the translation objective. In experiments, we first show that our model compares favorably with state-of-the-art sequential a… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

2
45
0

Year Published

2017
2017
2023
2023

Publication Types

Select...
3
3
2

Relationship

0
8

Authors

Journals

citations
Cited by 47 publications
(47 citation statements)
references
References 30 publications
(47 reference statements)
2
45
0
Order By: Relevance
“…As a consequence, most of the research uses dependency graph information as an external feature or carefully engineers more compact features extracted from the dependency tree arcs [24], [25]. On the other hand, some research adopts input latent graph parsing [33] as the syntax representation. Inducing the dependency tree in a principled manner while training allows the model to learn the internal representation of the sentence very well [31], [34].…”
Section: Structured Attentionmentioning
confidence: 99%
“…As a consequence, most of the research uses dependency graph information as an external feature or carefully engineers more compact features extracted from the dependency tree arcs [24], [25]. On the other hand, some research adopts input latent graph parsing [33] as the syntax representation. Inducing the dependency tree in a principled manner while training allows the model to learn the internal representation of the sentence very well [31], [34].…”
Section: Structured Attentionmentioning
confidence: 99%
“…The probability for a packed d-length dependency chain is obtained from a dependency graph, which is an edge-factored dependency score matrix (Hashimoto and Tsuruoka, 2017;Zhang et al, 2017). First, we explain the dependency graph.…”
Section: Packed D-length Dependency Chainmentioning
confidence: 99%
“…Thus, these methods cannot track all possible parents for each word within the decoding process. Similar to HiSAN, Hashimoto and Tsuruoka (2017) use dependency features as attention distributions, but different from HiSAN, they use pre-trained dependency relations, and do not take into account the chains of dependencies. ; Bastings et al (2017) consider higherorder dependency relationships in Seq2Seq by incorporating a graph convolution technique (Kipf and Welling, 2016) into the encoder.…”
Section: Related Workmentioning
confidence: 99%
“…However, in most existing NMT models, source sentences are treated as sequences where the syntactic knowledge is neglected. Some effort has been done to incorporate source syntax into NMT to enhance the attention model [Eriguchi et al, 2016b;Hashimoto and Tsuruoka, 2017;Sennrich and Haddow, 2016]. [Eriguchi et al, 2016b] proposed a tree-tosequence attentional NMT model where source-side parse tree was used and achieved promising improvement.…”
Section: Related Workmentioning
confidence: 99%
“…[Sennrich and Haddow, 2016] incorporated linguistic features to improve the NMT performance by appending feature vectors to word embeddings. [Hashimoto and Tsuruoka, 2017] proposed a multi-task framework to learn both source parsing and translation. Difference from previous syntax-based work, in this paper we focus on improve NMT encoder with sourceside long-distance word dependencies.…”
Section: Related Workmentioning
confidence: 99%