On Measuring Gender Bias in Translation of Gender-neutral Pronouns

Cho, Won Ik; Kim, Ji Won; Kim, Seok-Min; Kim, Nam Soo

doi:10.18653/v1/w19-3824

Cited by 39 publications

(51 citation statements)

References 11 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Arguably this setting is more natural, as it better aligns with how systems are used in real life. Several notable examples are coreference resolution (Rudinger et al, 2018;Zhao et al, 2018;Kurita et al, 2019), machine translation (Stanovsky et al, 2019;Cho et al, 2019), textual entailment (Dev et al, 2020a), language generation (Sheng et al, 2019), or clinical classification (Zhang et al, 2020).…”

Section: Related Workmentioning

confidence: 99%

UNQOVERing Stereotyping Biases via Underspecified Questions

Тао¹,

Khashabi²,

Khot³

et al. 2020

Findings of the Association for Computational Linguistics: EMNLP 2020

View full text Add to dashboard Cite

Warning: This paper contains examples of stereotypes that are potentially offensive.While language embeddings have been shown to have stereotyping biases, how these biases affect downstream question answering (QA) models remains unexplored. We present UN-QOVER, a general framework to probe and quantify biases through underspecified questions. We show that a naïve use of model scores can lead to incorrect bias estimates due to two forms of reasoning errors: positional dependence and question independence. We design a formalism that isolates the aforementioned errors. As case studies, we use this metric to analyze four important classes of stereotypes: gender, nationality, ethnicity, and religion. We probe five transformer-based QA models trained on two QA datasets, along with their underlying language models. Our broad study reveals that (1) all these models, with and without fine-tuning, have notable stereotyping biases in these classes; (2) larger models often have higher bias; and (3) the effect of fine-tuning on bias varies strongly with the dataset and the model size.

show abstract

Section: Related Workmentioning

confidence: 99%

UNQOVERing Stereotyping Biases via Underspecified Questions

Тао¹,

Khashabi²,

Khot³

et al. 2020

Findings of the Association for Computational Linguistics: EMNLP 2020

View full text Add to dashboard Cite

show abstract

“…There has been little work done for bias in language models for Hindi, and to the best of our knowledge, there has been no previous work that measures and analyses bias for MT of Hindi. Our approach uses two existing and broad frameworks for assessing bias in MT, including the Word Embedding Fairness Evaluation (Badilla et al, 2020) and the Translation Gender Bias Index (Cho et al, 2019) on Hindi-English MT systems. We modify some of the existing procedures within these metrics required for compatibility with Hindi grammar.…”

Section: Introductionmentioning

confidence: 99%

“…1. Construction of an equity evaluation corpus (EEC) (Kiritchenko and Mohammad, 2018) for Hindi of size 26370 utterances using 1558 sentiment words and 1100 occupations following the guidelines laid out in Cho et al (2019).…”

Section: Introductionmentioning

confidence: 99%

Evaluating Gender Bias in Hindi-English Machine Translation

Ramesh¹,

Gupta²,

Singh³

2021

Proceedings of the 3rd Workshop on Gender Bias in Natural Language Processing

View full text Add to dashboard Cite

With language models being deployed increasingly in the real world, it is essential to address the issue of the fairness of their outputs. The word embedding representations of these language models often implicitly draw unwanted associations that form a social bias within the model. The nature of gendered languages like Hindi, poses an additional problem to the quantification and mitigation of bias, owing to the change in the form of the words in the sentence, based on the gender of the subject. Additionally, there is sparse work done in the realm of measuring and debiasing systems for Indic languages. In our work, we attempt to evaluate and quantify the gender bias within a Hindi-English machine translation system. We implement a modified version of the existing TGBI metric based on the grammatical considerations for Hindi. We also compare and contrast the resulting bias measurements across multiple metrics for pre-trained embeddings and the ones learned by our machine translation model.

show abstract

“…Previous studies accounting for MT systems' strengths and weaknesses in the translation of gender shed light on the problem but, at the same time, have limitations. On one hand, the existing evaluations focused on gender bias were largely conducted on challenge datasets, which are controlled artificial benchmarks that provide a limited perspective on the extent of the phenomenon and may force unreliable conclusions (Prates et al, 2018;Cho et al, 2019;Escudé Font and Costa-jussà, 2019;Stanovsky et al, 2019). On the other hand, the natural corpora built on conversational language that were used in few studies (Elaraby et al, 2018;Vanmassenhove et al, 2018) include only a restricted quantity of not isolated gender-expressing forms, thus not permitting either extensive or targeted evaluations.…”

Section: Introductionmentioning

confidence: 99%

“…Previous attempts to test the production of gender-aware automatic translations solely focused on MT, where a widespread approach involves the creation of challenge datasets focused on specific linguistic phenomena. Prates et al (2018) and Cho et al (2019) construct template sentences using occupational or sentiment words associated with a gender-neutral pronoun, to be translated into an English gender-specified one ([x] is a professor: he/she is a professor). Similarly, the Occupations Test (Escudé Font and Costajussà, 2019) and Wino MT (Stanovsky et al, 2019) cast human entities into proto-or anti-stereotypical gender associations via coreference linking (e.g.…”

Section: Introductionmentioning

confidence: 99%

Gender in Danger? Evaluating Speech Translation Technology on the MuST-SHE Corpus

Bentivogli

Savoldi²,

Negri

et al. 2020

Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

View full text Add to dashboard Cite

Translating from languages without productive grammatical gender like English into gender-marked languages is a well-known difficulty for machines. This difficulty is also due to the fact that the training data on which models are built typically reflect the asymmetries of natural languages, gender bias included. Exclusively fed with textual data, machine translation is intrinsically constrained by the fact that the input sentence does not always contain clues about the gender identity of the referred human entities. But what happens with speech translation, where the input is an audio signal? Can audio provide additional information to reduce gender bias? We present the first thorough investigation of gender bias in speech translation, contributing with: i) the release of a benchmark useful for future studies, and ii) the comparison of different technologies (cascade and end-to-end) on two language directions (English-Italian/French).

show abstract

On Measuring Gender Bias in Translation of Gender-neutral Pronouns

Cited by 39 publications

References 11 publications

UNQOVERing Stereotyping Biases via Underspecified Questions

UNQOVERing Stereotyping Biases via Underspecified Questions

Evaluating Gender Bias in Hindi-English Machine Translation

Gender in Danger? Evaluating Speech Translation Technology on the MuST-SHE Corpus

Contact Info

Product

Resources

About