Masao Utiyama scite author profile

One of the difficulties of neural machine translation (NMT) is the recall and appropriate translation of low-frequency words or phrases. In this paper, we propose a simple, fast, and effective method for recalling previously seen translation examples and incorporating them into the NMT decoding process. Specifically, for an input sentence, we use a search engine to retrieve sentence pairs whose source sides are similar with the input sentence, and then collect n-grams that are both in the retrieved target sentences and aligned with words that match in the source sentences, which we call "translation pieces". We compute pseudoprobabilities for each retrieved sentence based on similarities between the input sentence and the retrieved source sentences, and use these to weight the retrieved translation pieces. Finally, an existing NMT model is used to translate the input sentence, with an additional bonus given to outputs that contain the collected translation pieces. We show our method improves NMT translation results up to 6 BLEU points on three narrow domain translation tasks where repetitiveness of the target sentences is particularly salient. It also causes little increase in the translation time, and compares favorably to another alternative retrievalbased method with respect to accuracy, speed, and simplicity of implementation.1 Note that there are existing retrieval-based methods for phrase-based and hierarchical phrase-based translation (Lopez, 2007;Germann, 2015). However, these methods do not improve translation quality but rather aim to improve the efficiency of the translation models.

show abstract

Instance Weighting for Neural Machine Translation Domain Adaptation

Wang¹,

Utiyama²,

Liu³

et al. 2017

128

View full text Add to dashboard Cite

Instance weighting has been widely applied to phrase-based machine translation domain adaptation. However, it is challenging to be applied to Neural Machine Translation (NMT) directly, because NMT is not a linear model. In this paper, two instance weighting technologies, i.e., sentence weighting and domain weighting with a dynamic weight learning strategy, are proposed for NMT domain adaptation. Empirical results on the IWSLT English-German/French tasks show that the proposed methods can substantially improve NMT performance by up to 2.7-6.7 BLEU points, outperforming the existing baselines by up to 1.6-3.6 BLEU points.

show abstract

Agreement on Target-bidirectional Neural Machine Translation

Liu¹,

Utiyama²,

Finch³

et al. 2016

View full text Add to dashboard Cite

Neural machine translation (NMT) with recurrent neural networks, has proven to be an effective technique for end-to-end machine translation. However, in spite of its promising advances over traditional translation methods, it typically suffers from an issue of unbalanced outputs, that arise from both the nature of recurrent neural networks themselves, and the challenges inherent in machine translation. To overcome this issue, we propose an agreement model for neural machine translation and show its effectiveness on large-scale Japaneseto-English and Chinese-to-English translation tasks. Our results show the model can achieve improvements of up to 1.4 BLEU over the strongest baseline NMT system. With the help of an ensemble technique, this new end-to-end NMT approach finally outperformed phrasebased and hierarchical phrase-based Moses baselines by up to 5.6 BLEU points.

show abstract

Reliable measures for aligning Japanese-English news articles and sentences

Utiyama¹,

Isahara²

2003

104

View full text Add to dashboard Cite

We have aligned Japanese and English news articles and sentences to make a large parallel corpus. We first used a method based on cross-language information retrieval (CLIR) to align the Japanese and English articles and then used a method based on dynamic programming (DP) matching to align the Japanese and English sentences in these articles. However, the results included many incorrect alignments. To remove these, we propose two measures (scores) that evaluate the validity of alignments. The measure for article alignment uses similarities in sentences aligned by DP matching and that for sentence alignment uses similarities in articles aligned by CLIR. They enhance each other to improve the accuracy of alignment. Using these measures, we have successfully constructed a largescale article and sentence alignment corpus available to the public.

show abstract

Reliable Measures for Aligning Japanese-English News Articles and Sentences.

Utiyama

Isahara

2003

Journal of Natural Language Processing

View full text Add to dashboard Cite

We have aligned Japanese and English news articles and sentences, extracted from the Yomiuri and the Daily Yomiuri newspapers, to make a large parallel corpus. We first used a method based on cross-language information retrieval to align the Japanese and English articles and then used a method based on dynamic programming (DP) matching to align the Japanese and English sentences in these articles. However, the articles and sentences included many incorrect alignments. To remove these, we propose two measures that evaluate the validity of the alignments. Using these measures, we successfully extracted a valid correspondence of about 47 thousands article pairs, 150 thousands 1-to-1 sentence pairs, and 38 thousands 1-to-many sentence pairs. We were therefore able to build the largest Japanese-English parallel corpus available to the public.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Masao Utiyama

Guiding Neural Machine Translation with Retrieved Translation Pieces

Instance Weighting for Neural Machine Translation Domain Adaptation

Agreement on Target-bidirectional Neural Machine Translation

Reliable measures for aligning Japanese-English news articles and sentences

Reliable Measures for Aligning Japanese-English News Articles and Sentences.

Contact Info

Product

Resources

About