Information asymmetry in Wikipedia across different languages: A statistical analysis

Roy, Debapriya Basu; Bhatia, Sumit; Jain, Prateek

doi:10.1002/asi.24553

Cited by 13 publications

(13 citation statements)

References 37 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The emerging need to analyze multilingual information on the web has been targeted in a variety of studies, e.g., [55] . Wikipedia is an essential source for multilingual studies regarding the content, number of users, and language coverage.…”

Section: Related Workmentioning

confidence: 99%

LaSER: Language-specific event recommendation

Abdollahi

Gottschalk

Demidova

2023

Journal of Web Semantics

View full text Add to dashboard Cite

Section: Related Workmentioning

confidence: 99%

LaSER: Language-specific event recommendation

Abdollahi

Gottschalk

Demidova

2023

Journal of Web Semantics

View full text Add to dashboard Cite

“…For instance, Wikipedia is an excellent open-access platform for finding multilingual translations of technical and scientific topics. However, it is currently underused by several scientific disciplines, and several languages with large numbers of speakers (such as Hindi and Turkish) are underrepresented (Kincaid et al 2020 , Roy et al 2021 ).…”

Section: Short-term Actions: Translation and The Promotion Of Multili...mentioning

confidence: 99%

Overcoming Language Barriers in Academia: Machine Translation Tools and a Vision for a Multilingual Future

et al. 2022

View full text Add to dashboard Cite

Having a central scientific language remains crucial for advancing and globally sharing science. Nevertheless, maintaining one dominant language also creates barriers to accessing scientific careers and knowledge. From an interdisciplinary perspective, we describe how, when, and why to make scientific literature more readily available in multiple languages through the practice of translation. We broadly review the advantages and limitations of neural machine translation systems and propose that translation can serve as both a short- and a long-term solution for making science more resilient, accessible, globally representative, and impactful beyond the academy. We outline actions that individuals and institutions can take to support multilingual science and scientists, including structural changes that encourage and value translating scientific literature. In the long term, improvements to machine translation technologies and collective efforts to change academic norms can transform a monolingual scientific hub into a multilingual scientific network. Translations are available in the supplemental material.

show abstract

“…For instance, a system needs to answer in Arabic to an Arabic question, but it can use evidence passages written in any language included in a large-document corpus such as English, German, Japanese and so on. In real-world applications, the issues of information asymmetry and information scarcity (Roy et al, 2022;Blasi et al, 2022;Asai et al, 2021a;Joshi et al, 2020) arise in many languages, hence the need to source answer contents from other languages-yet we often do not know a priori in which language the evidence can be found to answer a question.…”

Section: Task Formulationmentioning

confidence: 99%

MIA 2022 Shared Task: Evaluating Cross-lingual Open-Retrieval Question Answering for 16 Diverse Languages

Asai¹,

Longpre²,

Kasai³

et al. 2022

Proceedings of the Workshop on Multilingual Information Access (MIA)

View full text Add to dashboard Cite

We present the results of the Workshop on Multilingual Information Access (MIA) 2022 Shared Task, evaluating cross-lingual openretrieval question answering (QA) systems in 16 typologically diverse languages. In this task, we adapted two large-scale cross-lingual openretrieval QA datasets in 14 typologically diverse languages, and newly annotated openretrieval QA data in 2 underrepresented languages: Tagalog and Tamil. Four teams submitted their systems. The best constrained system uses entity-aware contextualized representations for document retrieval, thereby achieving an average F1 score of 31.6, which is 4.1 F1 absolute higher than the challenging baseline. The best system obtains particularly significant improvements in Tamil (20.8 F1), whereas most of the other systems yield nearly zero scores. The best unconstrained system achieves 32.2 F1, outperforming our baseline by 4.5 points. The official leaderboard 1 and baselines 2 models are publicly available.

show abstract

Information asymmetry in Wikipedia across different languages: A statistical analysis

Cited by 13 publications

References 37 publications

LaSER: Language-specific event recommendation

LaSER: Language-specific event recommendation

Overcoming Language Barriers in Academia: Machine Translation Tools and a Vision for a Multilingual Future

MIA 2022 Shared Task: Evaluating Cross-lingual Open-Retrieval Question Answering for 16 Diverse Languages

Contact Info

Product

Resources

About