Effective Cross-lingual Transfer of Neural Machine Translation Models without Shared Vocabularies

Kim, Yunsu; Gao, Yingbo; Ney, Hermann

doi:10.18653/v1/p19-1120

Cited by 62 publications

(65 citation statements)

References 38 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…LSTM is a type of recurrent neutral network (RNN). LSTM achieved great success in many applications, such as unconstrained handwriting recognition [46], speech recognition [47], handwriting generation [35], machine translation [48], etc. Each step of the LSTM has a series of repeated neural network templates.…”

Section: Inter-atomic Long-dependence Feature Extraction Methods Basedmentioning

confidence: 99%

Critical Temperature Prediction of Superconductors Based on Atomic Vectors and Deep Learning

Dan

et al. 2020

Symmetry

View full text Add to dashboard Cite

In this paper, a hybrid neural network (HNN) that combines a convolutional neural network (CNN) and long short-term memory neural network (LSTM) is proposed to extract the high-level characteristics of materials for critical temperature (Tc) prediction of superconductors. Firstly, by obtaining 73,452 inorganic compounds from the Materials Project (MP) database and building an atomic environment matrix, we obtained a vector representation (atomic vector) of 87 atoms by singular value decomposition (SVD) of the atomic environment matrix. Then, the obtained atom vector was used to implement the coded representation of the superconductors in the order of the atoms in the chemical formula of the superconductor. The experimental results of the HNN model trained with 12,413 superconductors were compared with three benchmark neural network algorithms and multiple machine learning algorithms using two commonly used material characterization methods. The experimental results show that the HNN method proposed in this paper can effectively extract the characteristic relationships between the atoms of superconductors, and it has high accuracy in predicting the Tc.Symmetry 2020, 12, 262 2 of 13 same time, the prediction of the Tc of superconductors, especially high-temperature superconductors, is not very accurate. The solution of these problems depends on the discovery of superconductors or similar materials and an understanding of the physical properties of these materials. Although this was the focus of research for the past 30 years, the prediction of Tc of superconductors is still very difficult.Advances in computers, as well as the development and continuous improvement of first-principles computational quantum chemistry theories and statistical (or machine learning) methods, greatly influenced research activities related to material discovery and design [12]. Material design for high-throughput (HT) calculations made progress in determining the structure of thousands of inorganic solids [13,14]. Since 1970, density functional theory (DFT) is widely used in the calculation of solid-state physics. In most cases, compared with other methods for solving multi-body problems in quantum mechanics, DFT using local density approximation gives very satisfactory results, and solid-state computing is less expensive than experiments. DFT is the leading method for the calculation of electronic structures in various fields; however, these methods are currently not suitable for high-level calculations due to the high-cost calculation. When using standard exchanges and correlation functions such as Perdew-Berke-Ernzerhof (PBE) [15] which is currently the most widely used exchange-related functional form in the calculation of solid structures, there are cases where the system is underestimated compared to the experimental values.In addition to prediction models based on physical principles/theories, the machine learning [16-21] approach for Tc prediction is a data-driven prediction model, which exploits the relationship between material ...

show abstract

Section: Inter-atomic Long-dependence Feature Extraction Methods Basedmentioning

confidence: 99%

Critical Temperature Prediction of Superconductors Based on Atomic Vectors and Deep Learning

Dan

et al. 2020

Symmetry

View full text Add to dashboard Cite

show abstract

“…This mapping is learned via the orthogonal Procrustes method [125] using bilingual dictionaries between the sources and the target language [61]. Kim et al [71] proposed a variant of this approach where the parent model is first trained and monolingual word-embeddings of the child source are mapped to the parent source's embeddings prior to fine-tuning. While Gu et al [54] require the child and parent sources to be mapped while training the parent model, the mapping in Kim et al [71]'s model can be trained after the parent model has been trained.…”

Section: Lexical Transfermentioning

confidence: 99%

A Survey of Multilingual Neural Machine Translation

2020

View full text Add to dashboard Cite

We present a survey on multilingual neural machine translation (MNMT), which has gained a lot of traction in recent years. MNMT has been useful in improving translation quality as a result of translation knowledge transfer (transfer learning). MNMT is more promising and interesting than its statistical machine translation counterpart, because end-to-end modeling and distributed representations open new avenues for research on machine translation. Many approaches have been proposed to exploit multilingual parallel corpora for improving translation quality. However, the lack of a comprehensive survey makes it difficult to determine which approaches are promising and, hence, deserve further exploration. In this article, we present an indepth survey of existing literature on MNMT. We first categorize various approaches based on their central use-case and then further categorize them based on resource scenarios, underlying modeling principles, coreissues, and challenges. Wherever possible, we address the strengths and weaknesses of several techniques by comparing them with each other. We also discuss the future directions for MNMT. This article is aimed towards both beginners and experts in NMT. We hope this article will serve as a starting point as well as a source of new ideas for researchers and engineers interested in MNMT.

show abstract

“…Nguyen and Chiang (2017) and Kocmi and Bojar (2018) with more languages and help target language switches. Kim et al (2019) propose additional techniques to enable NMT transfer even without shared vocabularies. To the best of our knowledge, we are the first to propose transfer learning strategies specialized in utilizing a pivot language, transferring a source encoder and a target decoder at the same time.…”

Section: Related Workmentioning

confidence: 99%

Pivot-based Transfer Learning for Neural Machine Translation between Non-English Languages

Kim¹,

Petrov²,

Petrushkov³

et al. 2019

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conferen

Self Cite

View full text Add to dashboard Cite

We present effective pre-training strategies for neural machine translation (NMT) using parallel corpora involving a pivot language, i.e., source-pivot and pivot-target, leading to a significant improvement in source→target translation. We propose three methods to increase the relation among source, pivot, and target languages in the pre-training: 1) step-wise training of a single model for different language pairs, 2) additional adapter component to smoothly connect pre-trained encoder and decoder, and 3) cross-lingual encoder training via autoencoding of the pivot language. Our methods greatly outperform multilingual models up to +2.6% BLEU in WMT 2019 French→German and German→Czech tasks. We show that our improvements are valid also in zero-shot/zeroresource scenarios.

show abstract

Effective Cross-lingual Transfer of Neural Machine Translation Models without Shared Vocabularies

Cited by 62 publications

References 38 publications

Critical Temperature Prediction of Superconductors Based on Atomic Vectors and Deep Learning

Critical Temperature Prediction of Superconductors Based on Atomic Vectors and Deep Learning

A Survey of Multilingual Neural Machine Translation

Pivot-based Transfer Learning for Neural Machine Translation between Non-English Languages

Contact Info

Product

Resources

About