We describe the results of a short-term SEE-ERAnet project the aim of which was to investigate the feasibility of machine translation (MT) research and development for several South Slavic and Balkan languages. The major tasks of the project were: compilation of a multilingual parallel corpus for the concerned languages, the XML mark-up of the corpus (tokenization, lemmatization, tagging), the sentence and word alignment of the corpus and the building of the statistical translation models. Additionally, based on the created resources and models, we conducted preliminary experiments on building prototype MT systems for Romanian <-> English, Greek <-> English and Slovene <-> English. We argue that by investing efforts in building accurate language resources, larger the better, as well as in fine-tuning of the statistical parameters, the current machine-learning technologies can be successfully used for a quick development of acceptable MT prototypes, valuable starting points in implementing working systems. We substantiate this claim with recent results from a follow-up national project, aiming at the development of a Romanian<->English translation system.
We briefly describe a word alignment system that combines two different methods in bitext correspondences identification. The first one is a hypotheses testing approach (Gale and Church, 1991; Melamed, 2001; Tufiş 2002) while the second one is closer to a model estimating approach (Brown et al., 1993; Och and Ney, 2000). We show that combining the two aligners the results are significantly improved as compared to each individual aligner.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.