Proceedings of the Third International Workshop on Cross Lingual Information Access Addressing the Information Need of Multilin 2009
DOI: 10.3115/1572433.1572438
|View full text |Cite
|
Sign up to set email alerts
|

Directions for exploiting asymmetries in multilingual Wikipedia

Abstract: Multilingual Wikipedia has been used extensively for a variety Natural Language Processing (NLP) tasks. Many Wikipedia entries (people, locations, events, etc.) have descriptions in several languages. These descriptions, however, are not identical. On the contrary, descriptions in different languages created for the same Wikipedia entry can vary greatly in terms of description length and information choice. Keeping these peculiarities in mind is necessary while using multilingual Wikipedia as a corpus for trai… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
6
0

Year Published

2009
2009
2021
2021

Publication Types

Select...
5
1
1

Relationship

0
7

Authors

Journals

citations
Cited by 8 publications
(6 citation statements)
references
References 8 publications
0
6
0
Order By: Relevance
“…For example, Table I illustrates inconsistencies in the German 3 and English 4 versions of the article "Gohi Bi Zoro Cyriac" caused by the information contained only in the German version and related to the move of this footballer to Charlton Athletic in 2007. On a more general note, previous studies have shown the information asymmetries across Wikipedia language pairs: Although the English Wikipedia is by far the biggest with respect to the number of articles, edits and users 2 , it has been shown that for many entities, Wikipedia articles in other languages are much longer than the corresponding descriptions in English and may contain contradictory information [Filatova 2009]. Paramita et al [2012] conducted a user study on a random sample of 800 cross-lingual partner articles to find out that 28.8% of them are only moderately similar and 18.8% were judged to be different.…”
Section: English Text Passagementioning
confidence: 99%
“…For example, Table I illustrates inconsistencies in the German 3 and English 4 versions of the article "Gohi Bi Zoro Cyriac" caused by the information contained only in the German version and related to the move of this footballer to Charlton Athletic in 2007. On a more general note, previous studies have shown the information asymmetries across Wikipedia language pairs: Although the English Wikipedia is by far the biggest with respect to the number of articles, edits and users 2 , it has been shown that for many entities, Wikipedia articles in other languages are much longer than the corresponding descriptions in English and may contain contradictory information [Filatova 2009]. Paramita et al [2012] conducted a user study on a random sample of 800 cross-lingual partner articles to find out that 28.8% of them are only moderately similar and 18.8% were judged to be different.…”
Section: English Text Passagementioning
confidence: 99%
“…Many topics that are primarily of local interest but are also known globally might be present in English as well as local language Wikipedia. However, due to the editors' preferences and biases, information present in the local language edition might not be present in the English edition, and vice versa (Filatova, 2009). While due to the much larger community of editors, English Wikipedia is often more comprehensive, often the finer details of a topic and specific local common knowledge are more likely to be present in the local language edition (Hecht & Gergle, 2009).…”
Section: Information Asymmetry In Different Wikipedia Editionsmentioning
confidence: 99%
“…Hecht and Gergle (2010) found that about 74% of all the concepts present in Wikipedia are present in only one language edition indicating that the different editions of Wikipedia cover vastly different topics. Filatova (2009) considered a set of 48 people in DUC 2004 biography generation task and studied how many Wikipedia editions contained pages for these people and compared their length. Barr on-Cedeno et al (2014) showed that languageindependent similarity measures such as character n-grams and word-count ratio are effective in measuring the cross-lingual similarity of Wikipedia articles and found no statistically significant difference between language dependent models (translation, monolingual, etc.).…”
Section: Studying Information Asymmetry In Wikipediamentioning
confidence: 99%
“…In March 2004, he joined ASEC Mimosas. [2] Cyriac was a topscorer of have shown the information asymmetries across Wikipedia language pairs: Although the English Wikipedia is by far the biggest with respect to the number of articles, edits and users 2 , it has been shown that for many entities, Wikipedia articles in other languages are much longer than the corresponding descriptions in English and may contain contradictory information [Filatova 2009]. Paramita et al [2012] conducted a user study on a random sample of 800 cross-lingual partner articles to find out that 28.8% of them are only moderately similar and 18.8% were judged to be different.…”
Section: English Text Passagementioning
confidence: 99%