Because of the scarcity of bilingual corpora, current Chinese--Vietnamese machine translation is far from satisfactory. Considering the differences between Chinese and Vietnamese, we investigate whether linguistic differences can be used to supervise machine translation and propose a method of syntax-based Chinese--Vietnamese tree-to-tree statistical machine translation with bilingual features. Analyzing the syntax differences between Chinese and Vietnamese, we define some linguistic difference-based rules, such as attributive position, time adverbial position, and locative adverbial position, and create rewards for similar rules. These rewards are integrated into the extraction of tree-to-tree translation rules, and we optimize the pruning of the search space during the decoding phase. The experiments on Chinese--Vietnamese bilingual sentence translation show that the proposed method performs better than several compared methods. Further, the results show that syntactic difference features, with search pruning, can improve the accuracy of machine translation without degrading the efficiency.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.