Tree-to-Sequence Attentional Neural Machine Translation

Eriguchi, Akiko; Hashimoto, Kazuma; Tsuruoka, Yoshimasa

doi:10.18653/v1/p16-1078

Cited by 226 publications

(200 citation statements)

References 19 publications

Supporting

Mentioning

193

Contrasting

Unclassified

Order By: Relevance

“…Some effort has been done to incorporate source syntax into NMT. Eriguchi et al (2016) proposed a tree-to-sequence attentional NMT model where source-side parse tree was used and achieved promising improvement. Intuitively, adding source syntactic information to [Source] 只有施工人员的安全得到了保证 , 才能继续施工 .…”

Section: Related Workmentioning

confidence: 99%

Sequence-to-Dependency Neural Machine Translation

Zhang

Yang

et al. 2017

Proceedings of the 55th Annual Meeting of the Association For Computational Linguistics (Volume 1: Long Papers)

105

View full text Add to dashboard Cite

Nowadays a typical Neural Machine Translation (NMT) model generates translations from left to right as a linear sequence, during which latent syntactic structures of the target sentences are not explicitly concerned. Inspired by the success of using syntactic knowledge of target language for improving statistical machine translation, in this paper we propose a novel Sequence-to-Dependency Neural Machine Translation (SD-NMT) method, in which the target word sequence and its corresponding dependency structure are jointly constructed and modeled, and this structure is used as context to facilitate word generations. Experimental results show that the proposed method significantly outperforms state-of-the-art baselines on Chinese-English and JapaneseEnglish translation tasks.

show abstract

Section: Related Workmentioning

confidence: 99%

Sequence-to-Dependency Neural Machine Translation

Zhang

Yang

et al. 2017

Proceedings of the 55th Annual Meeting of the Association For Computational Linguistics (Volume 1: Long Papers)

105

View full text Add to dashboard Cite

show abstract

“…For example, Part-Of-Speech (POS) tags are used for syntactic parsers. The parsers are used to improve higher-level tasks, such as natural language inference (Chen et al, 2016) and machine translation (Eriguchi et al, 2016). These systems are often pipelines and not trained end-to-end.…”

Section: Introductionmentioning

confidence: 99%

A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks

Hashimoto¹,

Xiong²,

Tsuruoka³

et al. 2017

Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

Self Cite

457

334

View full text Add to dashboard Cite

Transfer and multi-task learning have traditionally focused on either a single source-target pair or very few, similar tasks. Ideally, the linguistic levels of morphology, syntax and semantics would benefit each other by being trained in a single model. We introduce a joint many-task model together with a strategy for successively growing its depth to solve increasingly complex tasks. Higher layers include shortcut connections to lower-level task predictions to reflect linguistic hierarchies. We use a simple regularization term to allow for optimizing all model weights to improve one task's loss without exhibiting catastrophic interference of the other tasks. Our single end-to-end model obtains state-of-the-art or competitive results on five different tasks from tagging, parsing, relatedness, and entailment tasks.

show abstract

“…The experimental results show that our best model outperforms the best single NMT model reported in WAT '16 (Eriguchi et al, 2016b).…”

Section: Introductionmentioning

confidence: 84%

“…Eriguchi et al (2016a)'s baseline system (the first line in Table 3) was the best single (w/o ensembling) word-based NMT system that were reported in WAT '16. For a more fair evaluation, we also reimplemented a standard attention-based NMT system that uses exactly the same encoder, training procedure, and the hyperparameters as our proposed models, but has a word-based decoder.…”

Section: Baseline Systemsmentioning

confidence: 99%

“…After the encoder maps a source sentence into a fixed-length vector, the decoder maps the vector into a target sentence. The implementation of the encoder can be a convolutional neural network (CNN) (Kalchbrenner and Blunsom, 2013), a long short-term memory (LSTM) (Sutskever et al, 2014;Luong and Manning, 2016), a gated recurrent unit (GRU) (Cho et al, 2014b;Bahdanau et al, 2015), or a Tree-LSTM (Eriguchi et al, 2016b). While various architectures are leveraged as an encoder to capture the structural information in the source language, most of the NMT models rely on a standard sequential network such as LSTM or GRU as the decoder.…”

Section: Neural Machine Translationmentioning

confidence: 99%

See 1 more Smart Citation

Chunk-based Decoder for Neural Machine Translation

Ishiwatari

Yao²,

Liu

et al. 2017

Proceedings of the 55th Annual Meeting of the Association For Computational Linguistics (Volume 1: Long Papers)

View full text Add to dashboard Cite

Chunks (or phrases) once played a pivotal role in machine translation. By using a chunk rather than a word as the basic translation unit, local (intrachunk) and global (inter-chunk) word orders and dependencies can be easily modeled. The chunk structure, despite its importance, has not been considered in the decoders used for neural machine translation (NMT). In this paper, we propose chunk-based decoders for NMT, each of which consists of a chunk-level decoder and a word-level decoder. The chunklevel decoder models global dependencies while the word-level decoder decides the local word order in a chunk. To output a target sentence, the chunk-level decoder generates a chunk representation containing global information, which the wordlevel decoder then uses as a basis to predict the words inside the chunk. Experimental results show that our proposed decoders can significantly improve translation performance in a WAT '16 Englishto-Japanese translation task.

show abstract

Tree-to-Sequence Attentional Neural Machine Translation

Cited by 226 publications

References 19 publications

Sequence-to-Dependency Neural Machine Translation

Sequence-to-Dependency Neural Machine Translation

A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks

Chunk-based Decoder for Neural Machine Translation

Contact Info

Product

Resources

About