Identifying and Reducing Gender Bias in Word-Level Language Models

Bordia, Shikha; Bowman, Samuel R.

doi:10.18653/v1/n19-3002

Cited by 143 publications

(113 citation statements)

References 16 publications

(17 reference statements)

Supporting

Mentioning

112

Contrasting

Order By: Relevance

“…After training the baseline model, we implement our loss function and tune for the λ hyperparameter. We test the existing debiasing approaches, CDA and REG, as well but since Bordia and Bowman (2019) reported that results fluctuate substantially with different REG regularization coefficients, we perform hyperparameter tuning and report the best results in Table 2. Additionally, we implement a combination of our loss function and CDA and tune for λ.…”

Section: Methodsmentioning

confidence: 99%

“…We also implement the bias regularization method of Bordia and Bowman (2019) which debiases the word embedding during language model training by minimizing the projection of neutral words on the gender axis. We use hyperparameter tuning to find the best regularization coefficient and report results from the model trained with this coefficient.…”

Section: Existing Approachesmentioning

confidence: 99%

See 1 more Smart Citation

Reducing Gender Bias in Word-Level Language Models with a Gender-Equalizing Loss Function

Qian¹,

Muaz²,

Zhang³

et al. 2019

Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop

View full text Add to dashboard Cite

Gender bias exists in natural language datasets which neural language models tend to learn, resulting in biased text generation. In this research, we propose a debiasing approach based on the loss function modification. We introduce a new term to the loss function which attempts to equalize the probabilities of male and female words in the output. Using an array of bias evaluation metrics, we provide empirical evidence that our approach successfully mitigates gender bias in language models without increasing perplexity by much. In comparison to existing debiasing strategies, data augmentation, and word embedding debiasing, our method performs better in several aspects, especially in reducing gender bias in occupation words. Finally, we introduce a combination of data augmentation and our approach, and show that it outperforms existing strategies in all bias evaluation metrics. * Yusu Qian and Urwa Muaz contributed equally to the paper.

show abstract

Section: Methodsmentioning

confidence: 99%

Section: Existing Approachesmentioning

confidence: 99%

Reducing Gender Bias in Word-Level Language Models with a Gender-Equalizing Loss Function

Qian¹,

Muaz²,

Zhang³

et al. 2019

Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop

View full text Add to dashboard Cite

show abstract

“…Zhao et al (2019) and Basta et al (2019) demonstrated gender bias in pretrained language modeling representations (ELMo), which translates into downstream tasks, but did not consider the language generated by the ELMo language model. Bordia and Bowman (2019), as well as Qian et al (2019) identified biases in a language modeling context and propose regularization strategies of generating certain words (e.g., "doctor") with differently gendered inputs.…”

Section: Background and Related Workmentioning

confidence: 99%

Reducing Sentiment Bias in Language Models via Counterfactual Evaluation

Huang

Zhang

Jiang

et al. 2020

Findings of the Association for Computational Linguistics: EMNLP 2020

View full text Add to dashboard Cite

Advances in language modeling architectures and the availability of large text corpora have driven progress in automatic text generation. While this results in models capable of generating coherent texts, it also prompts models to internalize social biases present in the training corpus. This paper aims to quantify and reduce a particular type of bias exhibited by language models: bias in the sentiment of generated text. Given a conditioning context (e.g., a writing prompt) and a language model, we analyze if (and how) the sentiment of the generated text is affected by changes in values of sensitive attributes (e.g., country names, occupations, genders) in the conditioning context using a form of counterfactual evaluation. We quantify sentiment bias by adopting individual and group fairness metrics from the fair machine learning literature, and demonstrate that largescale models trained on two different corpora (news articles, and Wikipedia) exhibit considerable levels of bias. We then propose embedding and sentiment prediction-derived regularization on the language model's latent representations. The regularizations improve fairness metrics while retaining comparable levels of perplexity and semantic similarity.♦ Denotes equal contribution. ♥ Work done during an internship at DeepMind.

show abstract

“…The study of biases in NLP systems is an active subfield. The majority of the work in the area is dedicated to pre-trained models, often via similaritybased analysis of the biases in input representations (Bolukbasi et al, 2016a;Garg et al, 2018;Chaloner and Maldonado, 2019;Bordia and Bowman, 2019;Tan and Celis, 2019;Zhao et al, , 2020, or an intermediate classification task (Recasens et al, 2013).…”

Section: Related Workmentioning

confidence: 99%

UNQOVERing Stereotyping Biases via Underspecified Questions

Тао¹,

Khashabi²,

Khot³

et al. 2020

Findings of the Association for Computational Linguistics: EMNLP 2020

View full text Add to dashboard Cite

Warning: This paper contains examples of stereotypes that are potentially offensive.While language embeddings have been shown to have stereotyping biases, how these biases affect downstream question answering (QA) models remains unexplored. We present UN-QOVER, a general framework to probe and quantify biases through underspecified questions. We show that a naïve use of model scores can lead to incorrect bias estimates due to two forms of reasoning errors: positional dependence and question independence. We design a formalism that isolates the aforementioned errors. As case studies, we use this metric to analyze four important classes of stereotypes: gender, nationality, ethnicity, and religion. We probe five transformer-based QA models trained on two QA datasets, along with their underlying language models. Our broad study reveals that (1) all these models, with and without fine-tuning, have notable stereotyping biases in these classes; (2) larger models often have higher bias; and (3) the effect of fine-tuning on bias varies strongly with the dataset and the model size.

show abstract

Identifying and Reducing Gender Bias in Word-Level Language Models

Cited by 143 publications

References 16 publications

Reducing Gender Bias in Word-Level Language Models with a Gender-Equalizing Loss Function

Reducing Gender Bias in Word-Level Language Models with a Gender-Equalizing Loss Function

Reducing Sentiment Bias in Language Models via Counterfactual Evaluation

UNQOVERing Stereotyping Biases via Underspecified Questions

Contact Info

Product

Resources

About