Proceedings of the Second Conference on Machine Translation 2017
DOI: 10.18653/v1/w17-4707
|View full text |Cite
|
Sign up to set email alerts
|

Predicting Target Language CCG Supertags Improves Neural Machine Translation

Abstract: Neural machine translation (NMT) models are able to partially learn syntactic information from sequential lexical information. Still, some complex syntactic phenomena such as prepositional phrase attachment are poorly modeled. This work aims to answer two questions: 1) Does explicitly modeling target language syntax help NMT? 2) Is tight integration of words and syntax better than multitask training? We introduce syntactic information in the form of CCG supertags in the decoder, by interleaving the target supe… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
48
2

Year Published

2017
2017
2020
2020

Publication Types

Select...
5
3
2

Relationship

0
10

Authors

Journals

citations
Cited by 62 publications
(51 citation statements)
references
References 26 publications
(37 reference statements)
1
48
2
Order By: Relevance
“…In contrast to these approaches, the DSA-LSTM only models the probability of surface strings, albeit with an auxiliary loss that distills the next-word predictive distribution of a syntactic language model. Earlier work has also explored multi-task learning with syntactic objectives as an auxiliary loss in language modelling and machine translation (Luong et al, 2016;Eriguchi et al, 2016;Nadejde et al, 2017;Enguehard et al, 2017;Aharoni and Goldberg, 2017;Eriguchi et al, 2017). Our approach of injecting syntactic bias through a KD objective is orthogonal to this approach, with the primary difference that here the student DSA-LSTM has no direct access to syntactic annotations; it does, however, have access to the teacher RNNG's softmax distribution over the next word.…”
Section: Related Workmentioning
confidence: 99%
“…In contrast to these approaches, the DSA-LSTM only models the probability of surface strings, albeit with an auxiliary loss that distills the next-word predictive distribution of a syntactic language model. Earlier work has also explored multi-task learning with syntactic objectives as an auxiliary loss in language modelling and machine translation (Luong et al, 2016;Eriguchi et al, 2016;Nadejde et al, 2017;Enguehard et al, 2017;Aharoni and Goldberg, 2017;Eriguchi et al, 2017). Our approach of injecting syntactic bias through a KD objective is orthogonal to this approach, with the primary difference that here the student DSA-LSTM has no direct access to syntactic annotations; it does, however, have access to the teacher RNNG's softmax distribution over the next word.…”
Section: Related Workmentioning
confidence: 99%
“…Section 10 sheds light on the overall patterns that arise from Table 1: Example sentence with different word-level annotations. The CCG supertags are taken from Nadejde et al (2017). POS and semantic tags are our own annotation, as well as the German translation and its morphological tags.…”
Section: Introductionmentioning
confidence: 99%
“…NMT systems have outperformed the state-of-the-art SMT model on various language pairs in terms of translation qual-ity (Luong et al, 2015;Bentivogli et al, 2016;Junczys-Dowmunt et al, 2016;Wu et al, 2016;Toral and Sánchez-Cartagena, 2017). However, due to some deficiencies of NMT systems such as the limited vocabulary size, low adequacy for some translations, much research work has involved incorporating extra knowledge such as SMT features or linguistic features into NMT to improve translation performance (He et al, 2016;Sennrich and Haddow, 2016;Nadejde et al, 2017;Wang et al, 2017).…”
Section: Introductionmentioning
confidence: 99%