Null It Out: Guarding Protected Attributes by Iterative Nullspace Projection

Ravfogel, Shauli; Elazar, Yanai; Gonen, Hila; Twiton, Michael; Goldberg, Yoav

doi:10.18653/v1/2020.acl-main.647

Cited by 167 publications

(296 citation statements)

References 28 publications

(39 reference statements)

Supporting

Mentioning

291

Contrasting

Order By: Relevance

“…They conclude that adversarial learning alone does not guarantee invariant representations for the protected attributes. Ravfogel et al (2020) found that iteratively projecting word embeddings to the null space of the gender direction to further improve the debiasing performance.…”

Section: Related Workmentioning

confidence: 99%

Gender-preserving Debiasing for Pre-trained Word Embeddings

Kaneko¹,

Bollegala²

2019

Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

View full text Add to dashboard Cite

Word embeddings learnt from massive text collections have demonstrated significant levels of discriminative biases such as gender, racial or ethnic biases, which in turn bias the down-stream NLP applications that use those word embeddings. Taking gender-bias as a working example, we propose a debiasing method that preserves non-discriminative gender-related information, while removing stereotypical discriminative gender biases from pre-trained word embeddings. Specifically, we consider four types of information: feminine, masculine, gender-neutral and stereotypical, which represent the relationship between gender vs. bias, and propose a debiasing method that (a) preserves the genderrelated information in feminine and masculine words, (b) preserves the neutrality in genderneutral words, and (c) removes the biases from stereotypical words. Experimental results on several previously proposed benchmark datasets show that our proposed method can debias pre-trained word embeddings better than existing SoTA methods proposed for debiasing word embeddings while preserving gender-related but non-discriminative information.

show abstract

Section: Related Workmentioning

confidence: 99%

Gender-preserving Debiasing for Pre-trained Word Embeddings

Kaneko¹,

Bollegala²

2019

Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

View full text Add to dashboard Cite

show abstract

“…Splitting the representations into components is done using INLP (Ravfogel et al, 2020), an algorithm for removing information from vector representations.…”

Section: Dissecting Mbert Representationsmentioning

confidence: 99%

It’s not Greek to mBERT: Inducing Word-Level Translations from Multilingual BERT

Gonen¹,

Ravfogel²,

Elazar³

et al. 2020

Proceedings of the Third BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP

Self Cite

View full text Add to dashboard Cite

Recent works have demonstrated that multilingual BERT (mBERT) learns rich cross-lingual representations, that allow for transfer across languages. We study the word-level translation information embedded in mBERT and present two simple methods that expose remarkable translation capabilities with no finetuning. The results suggest that most of this information is encoded in a non-linear way, while some of it can also be recovered with purely linear tools. As part of our analysis, we test the hypothesis that mBERT learns representations which contain both a languageencoding component and an abstract, crosslingual component, and explicitly identify an empirical language-identity subspace within mBERT representations.

show abstract

“…Such studies on model bias have led to many bias mitigation techniques (e.g., Bolukbasi et al, 2016b;Dev et al, 2020a;Ravfogel et al, 2020;Dev et al, 2020b). In this work, we focus on exploring biases across QA models and expect that our framework could also help future efforts on bias mitigation.…”

Section: Related Workmentioning

confidence: 99%

UNQOVERing Stereotyping Biases via Underspecified Questions

Тао¹,

Khashabi²,

Khot³

et al. 2020

Findings of the Association for Computational Linguistics: EMNLP 2020

View full text Add to dashboard Cite

Warning: This paper contains examples of stereotypes that are potentially offensive.While language embeddings have been shown to have stereotyping biases, how these biases affect downstream question answering (QA) models remains unexplored. We present UN-QOVER, a general framework to probe and quantify biases through underspecified questions. We show that a naïve use of model scores can lead to incorrect bias estimates due to two forms of reasoning errors: positional dependence and question independence. We design a formalism that isolates the aforementioned errors. As case studies, we use this metric to analyze four important classes of stereotypes: gender, nationality, ethnicity, and religion. We probe five transformer-based QA models trained on two QA datasets, along with their underlying language models. Our broad study reveals that (1) all these models, with and without fine-tuning, have notable stereotyping biases in these classes; (2) larger models often have higher bias; and (3) the effect of fine-tuning on bias varies strongly with the dataset and the model size.

show abstract

Null It Out: Guarding Protected Attributes by Iterative Nullspace Projection

Cited by 167 publications

References 28 publications

Gender-preserving Debiasing for Pre-trained Word Embeddings

Gender-preserving Debiasing for Pre-trained Word Embeddings

It’s not Greek to mBERT: Inducing Word-Level Translations from Multilingual BERT

UNQOVERing Stereotyping Biases via Underspecified Questions

Contact Info

Product

Resources

About