Multi-Task Learning for Multiple Language Translation

Dong, Daxiang; Wu, Hua; He, Wei; Yu, Dianhai; Wang, Haifeng

doi:10.3115/v1/p15-1166

Cited by 509 publications

(445 citation statements)

References 11 publications

Supporting

Mentioning

427

Contrasting

Order By: Relevance

“…Multi-task learning was shown to be effective for a variety of NLP tasks, such as POS tagging, chunking, named entity recognition (Collobert et al, 2011) or sentence compression (Klerke et al, 2016). It has also been used in encoderdecoder architectures, typically for machine translation (Dong et al, 2015;Luong et al, 2016), though so far not with attentional decoders.…”

Section: Related Workmentioning

confidence: 99%

Learning attention for historical text normalization by learning to pronounce

Bollmann

Bingel

Søgaard

2017

Proceedings of the 55th Annual Meeting of the Association For Computational Linguistics (Volume 1: Long Papers)

View full text Add to dashboard Cite

Automated processing of historical texts often relies on pre-normalization to modern word forms. Training encoder-decoder architectures to solve such problems typically requires a lot of training data, which is not available for the named task. We address this problem by using several novel encoder-decoder architectures, including a multi-task learning (MTL) architecture using a grapheme-to-phoneme dictionary as auxiliary data, pushing the state-of-theart by an absolute 2% increase in performance. We analyze the induced models across 44 different texts from Early New High German. Interestingly, we observe that, as previously conjectured, multi-task learning can learn to focus attention during decoding, in ways remarkably similar to recently proposed attention mechanisms. This, we believe, is an important step toward understanding how MTL works.

show abstract

Section: Related Workmentioning

confidence: 99%

Learning attention for historical text normalization by learning to pronounce

Bollmann

Bingel

Søgaard

2017

Proceedings of the 55th Annual Meeting of the Association For Computational Linguistics (Volume 1: Long Papers)

View full text Add to dashboard Cite

show abstract

“…Furthermore, their method requires to train an additional NMT from target language to source language, which may negatively influence the attention model in the decoder network. Dong et al (2015) propose a multi-task learning method for translating one source language into multiple target languages in NMT so that the encoder network can be shared when dealing with several sets of bilingual data. , and Firat et al (2016) further deal with more complicated cases (e.g.…”

Section: Related Workmentioning

confidence: 99%

Exploiting Source-side Monolingual Data in Neural Machine Translation

Zhang¹

2016

Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

254

207

View full text Add to dashboard Cite

Neural Machine Translation (NMT) based on the encoder-decoder architecture has recently become a new paradigm. Researchers have proven that the target-side monolingual data can greatly enhance the decoder model of NMT. However, the source-side monolingual data is not fully explored although it should be useful to strengthen the encoder model of NMT, especially when the parallel corpus is far from sufficient. In this paper, we propose two approaches to make full use of the sourceside monolingual data in NMT. The first approach employs the self-learning algorithm to generate the synthetic large-scale parallel data for NMT training. The second approach applies the multi-task learning framework using two NMTs to predict the translation and the reordered source-side monolingual sentences simultaneously. The extensive experiments demonstrate that the proposed methods obtain significant improvements over the strong attention-based NMT.

show abstract

“…In its simplest form our model exploits a one-to-one NMT architecture: the source English sentence is translated into k candidate foreign sentences and then back-translated into English. Inspired by multi-way machine translation which has shown performance gains over single-pair models (Zoph and Knight, 2016;Dong et al, 2015;Firat et al, 2016a), we also explore an alternative pivoting technique which uses multiple languages rather than a single one. Our model inherits advantages from NMT such as a small memory footprint and conceptually easy decoding (implemented as beam search).…”

Section: Related Workmentioning

confidence: 99%

Paraphrasing Revisited with Neural Machine Translation

Mallinson

Sennrich²,

Lapata³

2017

Proceedings of the 15th Conference of the European Chapter of The Association for Computational Linguistics: Volume 1

185

143

View full text Add to dashboard Cite

Recognizing and generating paraphrases is an important component in many natural language processing applications. A wellestablished technique for automatically extracting paraphrases leverages bilingual corpora to find meaning-equivalent phrases in a single language by "pivoting" over a shared translation in another language. In this paper we revisit bilingual pivoting in the context of neural machine translation and present a paraphrasing model based purely on neural networks. Our model represents paraphrases in a continuous space, estimates the degree of semantic relatedness between text segments of arbitrary length, or generates candidate paraphrases for any source input. Experimental results across tasks and datasets show that neural paraphrases outperform those obtained with conventional phrase-based pivoting approaches.

show abstract

Multi-Task Learning for Multiple Language Translation

Cited by 509 publications

References 11 publications

Learning attention for historical text normalization by learning to pronounce

Learning attention for historical text normalization by learning to pronounce

Exploiting Source-side Monolingual Data in Neural Machine Translation

Paraphrasing Revisited with Neural Machine Translation

Contact Info

Product

Resources

About