Active learning is a framework that makes it possible to efficiently train statistical models by selecting informative examples from a pool of unlabeled data. Previous work has found this framework effective for machine translation (MT), making it possible to train better translation models with less effort, particularly when annotators translate short phrases instead of full sentences. However, previous methods for phrase-based active learning in MT fail to consider whether the selected units are coherent and easy for human translators to translate, and also have problems with † ,
In statistical machine translation, the pivot translation approach allows for translation of language pairs with little or no parallel data by introducing a third language for which data exists. In particular, the triangulation method, which translates by combining source-pivot and pivot-target translation models into a source-target model is known for its high translation accuracy. However, in the conventional triangulation method, information of pivot phrases is forgotten, and not used in the translation process. In this research, we propose a novel approach to remember the pivot phrases in the triangulation stage, and use a pivot language model as an additional information source at translation phase. Experimental results on the united nations parallel corpus showed significant improvements in all tested combinations of languages.
Pivot translation is a useful method for translating between languages with little or no parallel data by utilizing parallel data in an intermediate language such as English. A popular approach for pivot translation used in phrase-based or tree-based translation models combines source-pivot and pivot-target translation models into a source-target model, as known as triangulation. However, this combination is based on the constituent words' surface forms and often produces incorrect source-target phrase pairs due to semantic ambiguity in the pivot language, and interlingual differences. This degrades translation accuracy. In this paper, we propose a approach for the triangulation using syntactic subtrees in the pivot language to distinguish pivot language words by their syntactic roles to avoid incorrect phrase combinations. Experimental results on the United Nations Parallel Corpus show the proposed method gains in all tested combinations of language, up to 2.3 BLEU points. 1
Active learning is a framework that makes it possible to efficiently train statistical models by selecting informative examples from a pool of unlabeled data. Previous work has found this framework effective for machine translation (MT), making it possible to train better translation models with less effort, particularly when anno-tators translate short phrases instead of full sentences. However, previous methods for phrase-based active learning in MT fail to consider whether the selected units are coherent and easy for human translators to translate, and also have problems with † ,
Pivot translation allows for translation of language pairs with little or no parallel data by introducing a third language for which data exists. In particular, the triangulation method, which translates by combining source-pivot and pivot-target translation models into a source-target model, is known for its high translation accuracy. However, in the conventional triangulation method, information of pivot phrases is forgotten and not used in the translation process. In this paper, we propose a novel approach to remember the pivot phrases in the triangulation stage, and use a pivot language model as an additional information source at translation time. Experimental results on the Europarl corpus showed gains of 0.4-1.2 BLEU points in all tested combinations of languages 1 .
The pivot translation is useful method for translating between languages that contain little or no parallel data by utilizing equivalents in an intermediate language such as English. Commonly, phrase-based or tree-based pivot translation methods merge source-pivot and pivot-target translation models into a source-target model. This tactic is known as triangulation. However, the combination is based on the surface forms of constituent words, and it often produces incorrect source-target phrase pairs because of interlingual differences and semantic ambiguities in the pivot language. The translation accuracy is thus degraded. This paper proposes a triangulation approach that utilizes syntactic subtrees in the pivot language to avoid incorrect phrase combinations by distinguishing pivot language words by their syntactic roles. The results of the experiments conducted on the United Nations Parallel Corpus demonstrate that the proposed method is superior to other pivot translation approaches in all tested combinations of languages.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.