Evaluating translational correspondence using annotation projection

Hwa, Rebecca; Resnik, Philip; Weinberg, Amy; Kolak, Okan

doi:10.3115/1073083.1073149

Cited by 50 publications

(60 citation statements)

References 19 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…is similar to factory (En.). Also related are the areas of word alignment for machine translation (Och and Ney, 2000), induction of translation lexicons , and cross-language annotation projections to a second language (Riloff et al, 2002;Hwa et al, 2002;Mohammad et al, 2007). As with cross-language information retrieval, these areas have primarily considered direct translations between words, rather than an entire spectrum of relatedness, as we do in our work.…”

Section: Related Workmentioning

confidence: 99%

“…Measures of cross-language relatedness are useful for a large number of applications, including cross-language information retrieval (Nie et al, 1999;Monz and Dorr, 2005), cross-language text classification (Gliozzo and Strapparava, 2006), lexical choice in machine translation (Och and Ney, 2000;Bangalore et al, 2007), induction of translation lexicons , cross-language annotation and resource projections to a second language (Riloff et al, 2002;Hwa et al, 2002;Mohammad et al, 2007).…”

Section: Motivationmentioning

confidence: 99%

“…For instance, given the word factory in English and the word lavoratore in Italian (En. worker), the method can measure the relatedness of these two words despite the fact that they belong to two different languages.Measures of cross-language relatedness are useful for a large number of applications, including cross-language information retrieval (Nie et al, 1999;Monz and Dorr, 2005), cross-language text classification (Gliozzo and Strapparava, 2006), lexical choice in machine translation (Och and Ney, 2000;Bangalore et al, 2007), induction of translation lexicons (Schafer and Yarowsky, 2002), cross-language annotation and resource projections to a second language (Riloff et al, 2002;Hwa et al, 2002;Mohammad et al, 2007).The method we propose is based on a measure of closeness between concept vectors automatically built from Wikipedia, which are mapped via the Wikipedia interlanguage links. Unlike previous methods for cross-language mapping, which are typically limited by the availability of bilingual dictionaries or parallel texts, the method proposed in this paper can be used to measure the relatedness of word pairs in any of the 250 languages for which a Wikipedia version exists.…”

mentioning

confidence: 99%

See 2 more Smart Citations

Cross-lingual semantic relatedness using encyclopedic knowledge

Hassan

Mihalcea

2009

Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing Volume 3 - EMNLP '09

View full text Add to dashboard Cite

In this paper, we address the task of crosslingual semantic relatedness. We introduce a method that relies on the information extracted from Wikipedia, by exploiting the interlanguage links available between Wikipedia versions in multiple languages. Through experiments performed on several language pairs, we show that the method performs well, with a performance comparable to monolingual measures of relatedness. MotivationGiven the accelerated growth of the number of multilingual documents on the Web and elsewhere, the need for effective multilingual and cross-lingual text processing techniques is becoming increasingly important. In this paper, we address the task of cross-lingual semantic relatedness, and introduce a method that relies on Wikipedia in order to calculate the relatedness of words across languages. For instance, given the word factory in English and the word lavoratore in Italian (En. worker), the method can measure the relatedness of these two words despite the fact that they belong to two different languages.Measures of cross-language relatedness are useful for a large number of applications, including cross-language information retrieval (Nie et al., 1999;Monz and Dorr, 2005), cross-language text classification (Gliozzo and Strapparava, 2006), lexical choice in machine translation (Och and Ney, 2000;Bangalore et al., 2007), induction of translation lexicons (Schafer and Yarowsky, 2002), cross-language annotation and resource projections to a second language (Riloff et al., 2002;Hwa et al., 2002;Mohammad et al., 2007).The method we propose is based on a measure of closeness between concept vectors automatically built from Wikipedia, which are mapped via the Wikipedia interlanguage links. Unlike previous methods for cross-language mapping, which are typically limited by the availability of bilingual dictionaries or parallel texts, the method proposed in this paper can be used to measure the relatedness of word pairs in any of the 250 languages for which a Wikipedia version exists.The paper is organized as follows. We first provide a brief overview of Wikipedia, followed by a description of the method to build concept vectors based on this encyclopedic resource. We then show how these concept vectors can be mapped across languages for a cross-lingual measure of word relatedness. Through evaluations run on six language pairs, connecting English, Spanish, Arabic and Romanian, we show that the method is effective at capturing the cross-lingual relatedness of words, with results comparable to the monolingual measures of relatedness.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Motivationmentioning

confidence: 99%

mentioning

confidence: 99%

See 1 more Smart Citation

Cross-lingual semantic relatedness using encyclopedic knowledge

Hassan

Mihalcea

2009

Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing Volume 3 - EMNLP '09

View full text Add to dashboard Cite

show abstract

“…, if e i is aligned to c j and e i ′ aligned to c j ′ , according to the dependency correspondence assumption (Hwa et al, 2002), there exists a triple <c j , R c , c j ′ >.…”

Section: Syntactic Featuresmentioning

confidence: 99%

Improving word alignment using syntactic dependencies

Ozdowska

Sun

et al. 2008

Proceedings of the Second Workshop on Syntax and Structure in Statistical Translation - SSST '08

View full text Add to dashboard Cite

We introduce a word alignment framework that facilitates the incorporation of syntax encoded in bilingual dependency tree pairs. Our model consists of two sub-models: an anchor word alignment model which aims to find a set of high-precision anchor links and a syntaxenhanced word alignment model which focuses on aligning the remaining words relying on dependency information invoked by the acquired anchor links. We show that our syntaxenhanced word alignment approach leads to a 10.32% and 5.57% relative decrease in alignment error rate compared to a generative word alignment model and a syntax-proof discriminative word alignment model respectively. Furthermore, our approach is evaluated extrinsically using a phrase-based statistical machine translation system. The results show that SMT systems based on our word alignment approach tend to generate shorter outputs. Without length penalty, using our word alignments yields statistically significant improvement in Chinese-English machine translation in comparison with the baseline word alignment.

show abstract

“…The study in [5] estimates the degree of syntactic parallelism in dependency relations between English and Chinese. Nevertheless direct correspondence is often too restrictive and syntactic projection yields good enough annotations to train a dependency parser.…”

Section: Introductionmentioning

confidence: 99%

Cross-Lingual Alignment of FrameNet Annotations through Hidden Markov Models

Annesi

Basili

2010

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Abstract. The development of annotated resources in the area of frame semantics has been crucial to the development of robust systems for shallow semantic parsing. Resource-poor languages have shown a significant delay due to the lack of sufficient training data. Recent works proposed to exploit parallel corpora in order to automatically transfer the semantic information available for English to other target languages. In this paper, an approach based on Hidden Markov Models is proposed to support the automatic semantic transfer and use an aligned bilingual corpus to develop large scale annotated data sets. As this method relies just on lexical alignment of sentence pairs, it is robust against preprocessing errors and does not require complex optimization, like syntax-dependent models for accurate cross-lingual mapping. The experimental evaluation over an English-Italian corpus is successful, achieving 86% of accuracy on average, and improves on the state of the art methods for the same task.

show abstract

Evaluating translational correspondence using annotation projection

Cited by 50 publications

References 19 publications

Cross-lingual semantic relatedness using encyclopedic knowledge

Cross-lingual semantic relatedness using encyclopedic knowledge

Improving word alignment using syntactic dependencies

Cross-Lingual Alignment of FrameNet Annotations through Hidden Markov Models

Contact Info

Product

Resources

About