The platform will undergo maintenance on Sep 14 at about 7:45 AM EST and will be unavailable for approximately 2 hours.
Proceedings of the 55th Annual Meeting of the Association For Computational Linguistics (Volume 2: Short Papers) 2017
DOI: 10.18653/v1/p17-2012
|View full text |Cite
|
Sign up to set email alerts
|

Learning to Parse and Translate Improves Neural Machine Translation

Abstract: There has been relatively little attention to incorporating linguistic prior to neural machine translation. Much of the previous work was further constrained to considering linguistic prior on the source side. In this paper, we propose a hybrid model, called NMT+RNNG, that learns to parse and translate by combining the recurrent neural network grammar into the attention-based neural machine translation. Our approach encourages the neural machine translation model to incorporate linguistic prior during training… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
115
0
1

Year Published

2017
2017
2023
2023

Publication Types

Select...
5
3
1

Relationship

0
9

Authors

Journals

citations
Cited by 128 publications
(116 citation statements)
references
References 28 publications
(24 reference statements)
0
115
0
1
Order By: Relevance
“…In contrast to these approaches, the DSA-LSTM only models the probability of surface strings, albeit with an auxiliary loss that distills the next-word predictive distribution of a syntactic language model. Earlier work has also explored multi-task learning with syntactic objectives as an auxiliary loss in language modelling and machine translation (Luong et al, 2016;Eriguchi et al, 2016;Nadejde et al, 2017;Enguehard et al, 2017;Aharoni and Goldberg, 2017;Eriguchi et al, 2017). Our approach of injecting syntactic bias through a KD objective is orthogonal to this approach, with the primary difference that here the student DSA-LSTM has no direct access to syntactic annotations; it does, however, have access to the teacher RNNG's softmax distribution over the next word.…”
Section: Related Workmentioning
confidence: 99%
“…In contrast to these approaches, the DSA-LSTM only models the probability of surface strings, albeit with an auxiliary loss that distills the next-word predictive distribution of a syntactic language model. Earlier work has also explored multi-task learning with syntactic objectives as an auxiliary loss in language modelling and machine translation (Luong et al, 2016;Eriguchi et al, 2016;Nadejde et al, 2017;Enguehard et al, 2017;Aharoni and Goldberg, 2017;Eriguchi et al, 2017). Our approach of injecting syntactic bias through a KD objective is orthogonal to this approach, with the primary difference that here the student DSA-LSTM has no direct access to syntactic annotations; it does, however, have access to the teacher RNNG's softmax distribution over the next word.…”
Section: Related Workmentioning
confidence: 99%
“…We then experiment in a low-resource scenario using the German, Russian and Czech to English training data from the News Commentary v8 corpus, following Eriguchi et al (2017). In all cases we parse the English sentences into constituency trees using the BLLIP parser (Charniak and Johnson, 2005).…”
Section: Experiments and Resultsmentioning
confidence: 99%
“…In parallel and highly related to our work, Eriguchi et al (2017) proposed to model the target syntax in NMT in the form of dependency trees by using an RNNG-based decoder (Dyer et al, 2016), while Nadejde et al (2017) incorporated target syntax by predicting CCG tags serialized into the target translation. Our work differs from those by modeling syntax using constituency trees, as was previously common in the "traditional" syntaxbased machine translation literature.…”
Section: Introduction and Modelmentioning
confidence: 99%
“…Similarly, [35] incorporate linguistic annotation to semantic role labeling task. [9] combined translation and dependency parsing by sharing the translation encoder hidden states with the buffer hidden states in a shift-reduce parsing model [8]. Aiming at the same goal, [1] proposed a very simple method.…”
Section: Related Workmentioning
confidence: 99%