Fairness in Language Models Beyond English: Gaps and Challenges

Ramesh, Krithika; Sitaram, Sunayana; Choudhury, Monojit

doi:10.48550/arxiv.2302.12578

Cited by 5 publications

(5 citation statements)

References 81 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…As previously discussed, it is likely that documents written in English and from developed countries form the bulk of the training corpus -this may limit the nature of responses to specific queries and enhance existing biases. Therefore, there is an urgent need for culturally sensitive multi-lingual LLMs [40]. Moreover, in the current LLM landscape there is a lack of transparency around algorithm development and reporting related to decisions algorithms make during the review process.…”

Section: Ethical Considerationsmentioning

confidence: 99%

Leveraging AI to improve evidence synthesis in conservation

Berger-Tal,

Wong,

Adams

et al. 2024

Preprint

View full text Add to dashboard Cite

Systematic evidence syntheses (systematic reviews and maps) summarize knowledge and are used to support decisions and policies in a variety of applied fields, from medicine and public health to biodiversity conservation. However, conducting these exercises in conservation is often expensive and slow, which can impede their use and hamper progress in addressing the biodiversity crisis. With the explosive growth of large language models (LLM) and other forms of artificial intelligence (AI), we discuss the promise and perils associated with their use. We conclude that, when judiciously used, AI has the potential to speed up and hopefully improve the process of evidence synthesis, which can be particularly useful for underfunded applied fields such as conservation science.

show abstract

Section: Ethical Considerationsmentioning

confidence: 99%

Leveraging AI to improve evidence synthesis in conservation

Berger-Tal,

Wong,

Adams

et al. 2024

Preprint

View full text Add to dashboard Cite

show abstract

“…While doing so, it is important to culturally contextualize NLP metrics and models. Instead of plainly translating English models into Bengali, Hindi, etc., we need to carefully consider the dimensions of fairness and types and sources of bias specific to that cultural context Ramesh et al, 2023). To address this gap, this paper proposes a methodology for developing culturally centered bias-evaluation datasets in NLP.…”

Section: Related Workmentioning

confidence: 99%

Proceedings of the First Workshop on Cross-Cultural Considerations in NLP (C3NLP)

2023

View full text Add to dashboard Cite

This paper investigates the performance of massively multilingual neural machine translation (NMT) systems in translating Yorùbá greetings (ε kú 1 ), which are a big part of Yorùbá language and culture, into English. To evaluate these models, we present IkiniYorùbá, a Yorùbá-English translation dataset containing some Yorùbá greetings, and sample use cases. We analysed the performance of different multilingual NMT systems including Google Translate and NLLB and show that these models struggle to accurately translate Yorùbá greetings into English. In addition, we trained a Yorùbá-English model by finetuning an existing NMT model on the training split of IkiniYorùbá and this achieved better performance when compared to the pre-trained multilingual NMT models, although they were trained on a large volume of data. * Equal contribution. 1 For simplicity of notation in the title, we make use of εthe Beninese Yorùbá letter representation of E . (which is used in Nigeria), and provides the context of greeting.Source: E . kú ojúmó . , e . sì kú déédé àsìkò yìí. Target: Good morning and compliment for this period.NLLB: You have died, and you have died to this hour. Google Translate: Die every day, and die at this time. Our Model: Good morning and compliment for this time.

show abstract

“…While doing so, it is important to culturally contextualize NLP metrics and models. Instead of plainly translating English models into Bengali, Hindi, etc., we need to carefully consider the dimensions of fairness and types and sources of bias specific to that cultural context (Malik et al, 2022;Ramesh et al, 2023). To address this gap, this paper proposes a methodology for developing culturally centered bias-evaluation datasets in NLP.…”

Section: Related Workmentioning

confidence: 99%

Toward Cultural Bias Evaluation Datasets: The Case of Bengali Gender, Religious, and National Identity

Das,

Guha,

Semaan

2023

Proceedings of the First Workshop on Cross-Cultural Considerations in NLP (C3NLP)

View full text Add to dashboard Cite

Critical studies found NLP systems to bias based on gender and racial identities. However, few studies focused on identities defined by cultural factors like religion and nationality. Compared to English, such research efforts are even further limited in major languages like Bengali due to the unavailability of labeled datasets. This paper describes a process for developing a bias evaluation dataset highlighting cultural influences on identity. We also provide a Bengali dataset as an artifact outcome that can contribute to future critical research.

show abstract

Fairness in Language Models Beyond English: Gaps and Challenges

Cited by 5 publications

References 81 publications

Leveraging AI to improve evidence synthesis in conservation

Leveraging AI to improve evidence synthesis in conservation

Proceedings of the First Workshop on Cross-Cultural Considerations in NLP (C3NLP)

Toward Cultural Bias Evaluation Datasets: The Case of Bengali Gender, Religious, and National Identity

Contact Info

Product

Resources

About