Abstract.A new bilingual dictionary can be built using two existing bilingual dictionaries, such as Japanese-English and English-Chinese to build Japanese-Chinese dictionary. However, Japanese and Chinese are nearer languages than English, there should be a more direct way of doing this. Since a lot of Japanese words are composed of kanji, which are similar to hanzi in Chinese, we attempt to build a dictionary for kanji words by simple conversion from kanji to hanzi. Our survey shows that around 2/3 of the nouns and verbal nouns in Japanese are kanji words, and more than 1/3 of them can be translated into Chinese directly. The accuracy of conversion is 97%. Besides, we obtain translation candidates for 24% of the Japanese words using English as a pivot language with 77% accuracy. By adding the kanji/hanzi conversion method, we increase the candidates by 9%, to 33%, with better quality candidates.
In this paper, we propose to use various global features for discriminative reranking in an SMT framework. We employ an online large-margin based training algorithm for the structural output support vector machines based on the margin infused relaxed algorithm. Besides the standard features used, such as decoder's scores, source and target sentences, alignments and part-of-speech tags, we include sentence type probabilities, posterior probabilities and back translation features for reranking. These features have been proved to be useful in other approaches in statistical machine translation but it is the first attempt to apply them in reranking. Our experimental results using 160K BTEC corpus show an improvement of 1-4 BLEU percentage points on Japanese/Chinese to English translation.
SUMMARYWe present a method to constrain a statistical generative word alignment model with the output from a discriminative model. The discriminative model is trained using a small set of hand-aligned data that ensures higher precision in alignment. On the other hand, the generative model improves the recall of alignment. By combining these two models, the alignment output becomes more suitable for use in developing a translation model for a phrase-based statistical machine translation (SMT) system. Our experimental results show that the joint alignment model improves the translation performance. The improvement in average of BLEU and METEOR scores is around 1.0-3.9 points. key words : word alignment, discriminative model, generative model, hybrid, SMT
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.