Haipeng Sun scite author profile

Unsupervised neural machine translation (UNMT) has recently achieved remarkable results for several language pairs. However, it can only translate between a single language pair and cannot produce translation results for multiple language pairs at the same time. That is, research on multilingual UNMT has been limited. In this paper, we empirically introduce a simple method to translate between thirteen languages using a single encoder and a single decoder, making use of multilingual data to improve UNMT for all language pairs. On the basis of the empirical findings, we propose two knowledge distillation methods to further enhance multilingual UNMT performance. Our experiments on a dataset with English translated to and from twelve other languages (including three language families and six language branches) show remarkable results, surpassing strong unsupervised individual baselines while achieving promising performance between non-English language pairs in zero-shot translation scenarios and alleviating poor performance in low-resource language pairs.

show abstract

Unsupervised Bilingual Word Embedding Agreement for Unsupervised Neural Machine Translation

Sun

Wang

Chen

et al. 2019

View full text Add to dashboard Cite

Unsupervised bilingual word embedding (UBWE), together with other technologies such as back-translation and denoising, has helped unsupervised neural machine translation (UNMT) achieve remarkable results in several language pairs. In previous methods, UBWE is first trained using nonparallel monolingual corpora and then this pre-trained UBWE is used to initialize the word embedding in the encoder and decoder of UNMT. That is, the training of UBWE and UNMT are separate. In this paper, we first empirically investigate the relationship between UBWE and UNMT. The empirical findings show that the performance of UNMT is significantly affected by the performance of UBWE. Thus, we propose two methods that train UNMT with UBWE agreement. Empirical results on several language pairs show that the proposed methods significantly outperform conventional UNMT.

show abstract

NICT’s Unsupervised Neural and Statistical Machine Translation Systems for the WMT19 News Translation Task

Marie¹,

Sun²,

Wang³

et al. 2019

View full text Add to dashboard Cite

This paper presents the NICT's participation in the WMT19 unsupervised news translation task. We participated in the unsupervised translation direction: German-Czech. Our primary submission to the task is the result of a simple combination of our unsupervised neural and statistical machine translation systems. Our system is ranked first for the German-to-Czech translation task, using only the data provided by the organizers ("constraint"), according to both BLEU-cased and human evaluation. We also performed contrastive experiments with other language pairs, namely, English-Gujarati and English-Kazakh, to better assess the effectiveness of unsupervised machine translation in for distant language pairs and in truly low-resource conditions. * Equal contribution in alphabetical order. This work was conductd when Haipeng Sun visited NICT as an internship student.

show abstract

A fine‐grained and traceable multidomain secure data‐sharing model for intelligent terminals in edge‐cloud collaboration scenarios

Sun

Tan

Zhu

et al. 2021

Int J of Intelligent Sys

View full text Add to dashboard Cite

Secure data‐sharing technology is a bridge for various collaborative operations among intelligent terminals in the edge‐cloud collaborative application scenario. For the shared data involves different levels of confidentiality, intelligent terminals for collaborative operations may be distributed in multiple management domains, and the private information of intelligent terminals is easy to be leaked in edge‐cloud collaboration scenarios, the security of data sharing is severely threatened. To solve these problems, this paper proposed a fine‐grained and traceable multidomain secure data‐sharing model for intelligent terminals. In this model, a key self‐certification algorithm is proposed, which avoids potential security threats of key leakage during the key distribution process. The model combines attribute encryption and threshold function to achieve more fine‐grained and more flexible secure data sharing; it uses blockchain technology to achieve integrity verification of stored data and traceability of shared data, and it combines on‐chain and off‐chain databases to achieve rapid retrieval and positioning of shared data distributed among multiple domains, which improves the efficiency of data sharing among domains. The security of the model proposed by us is proved, and compared with the cited literature, it is shown that the proposed model has certain advantages in terms of computational complexity and time consumption.

show abstract

English-Myanmar Supervised and Unsupervised NMT: NICT’s Machine Translation Systems at WAT-2019

Wang

Sun

Chen

et al. 2019

View full text Add to dashboard Cite

This paper presents the NICT's participation (team ID: NICT) in the 6th Workshop on Asian Translation (WAT-2019) shared translation task, specifically Myanmar (Burmese)-English task in both translation directions. We built neural machine translation (NMT) systems for these tasks. Our NMT systems were trained with language model pretraining. Back-translation technology is adopted to NMT. Our NMT systems rank the third in English-to-Myanmar and the second in Myanmar-to-English according to BLEU score.

show abstract

Unsupervised Neural Machine Translation With Cross-Lingual Language Representation Agreement

Sun

Wang

Chen

et al. 2020

IEEE/ACM Trans. Audio Speech Lang. Process.

View full text Add to dashboard Cite

Knowledge Distillation for Multilingual Unsupervised Neural Machine Translation

Sun

Wang

Chen

et al. 2020

Preprint

View full text Add to dashboard Cite

Unsupervised Neural Machine Translation for Similar and Distant Language Pairs

Sun

Wang

Utiyama

et al. 2021

ACM Trans. Asian Low-Resour. Lang. Inf. Process.

View full text Add to dashboard Cite

Unsupervised neural machine translation (UNMT) has achieved remarkable results for several language pairs, such as French–English and German–English. Most previous studies have focused on modeling UNMT systems; few studies have investigated the effect of UNMT on specific languages. In this article, we first empirically investigate UNMT for four diverse language pairs (French/German/Chinese/Japanese–English). We confirm that the performance of UNMT in translation tasks for similar language pairs (French/German–English) is dramatically better than for distant language pairs (Chinese/Japanese–English). We empirically show that the lack of shared words and different word orderings are the main reasons that lead UNMT to underperform in Chinese/Japanese–English. Based on these findings, we propose several methods, including artificial shared words and pre-ordering, to improve the performance of UNMT for distant language pairs. Moreover, we propose a simple general method to improve translation performance for all these four language pairs. The existing UNMT model can generate a translation of a reasonable quality after a few training epochs owing to a denoising mechanism and shared latent representations. However, learning shared latent representations restricts the performance of translation in both directions, particularly for distant language pairs, while denoising dramatically delays convergence by continuously modifying the training data. To avoid these problems, we propose a simple, yet effective and efficient, approach that (like UNMT) relies solely on monolingual corpora: pseudo-data-based unsupervised neural machine translation. Experimental results for these four language pairs show that our proposed methods significantly outperform UNMT baselines.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Haipeng Sun

Knowledge Distillation for Multilingual Unsupervised Neural Machine Translation

Unsupervised Bilingual Word Embedding Agreement for Unsupervised Neural Machine Translation

NICT’s Unsupervised Neural and Statistical Machine Translation Systems for the WMT19 News Translation Task

A fine‐grained and traceable multidomain secure data‐sharing model for intelligent terminals in edge‐cloud collaboration scenarios

English-Myanmar Supervised and Unsupervised NMT: NICT’s Machine Translation Systems at WAT-2019

Unsupervised Neural Machine Translation With Cross-Lingual Language Representation Agreement

Knowledge Distillation for Multilingual Unsupervised Neural Machine Translation

Unsupervised Neural Machine Translation for Similar and Distant Language Pairs

Contact Info

Product

Resources

About