StereoSet: Measuring stereotypical bias in pretrained language models

Nadeem, Moin; Bethke, Anna; Reddy, Siva

doi:10.18653/v1/2021.acl-long.416

Cited by 262 publications

(338 citation statements)

References 26 publications

(32 reference statements)

Supporting

Mentioning

232

Contrasting

Order By: Relevance

“…In general, machine learning has the ability to amplify biases presented implicitly and explicitly in the training data. Models that we reference in our study are based on BERT, which has been shown to learn and exacerbate stereotypes during training (e.g., Kurita et al 2019, Tan and Celis 2019, Nadeem et al 2021. We further train these models on Wikidata triples, which again has the potential to amplify harmful and toxic biases.…”

Section: Ethical Considerationsmentioning

confidence: 99%

Simple Entity-Centric Questions Challenge Dense Retrievers

Sciavolino¹,

Zhong²,

Lee³

et al. 2021

Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

View full text Add to dashboard Cite

Open-domain question answering has exploded in popularity recently due to the success of dense retrieval models, which have surpassed sparse models using only a few supervised training examples. However, in this paper, we demonstrate current dense models are not yet the holy grail of retrieval. We first construct EntityQuestions, a set of simple, entityrich questions based on facts from Wikidata (e.g., "Where was Arve Furset born?"), and observe that dense retrievers drastically underperform sparse methods. We investigate this issue and uncover that dense retrievers can only generalize to common entities unless the question pattern is explicitly observed during training. We discuss two simple solutions towards addressing this critical problem. First, we demonstrate that data augmentation is unable to fix the generalization problem. Second, we argue a more robust passage encoder helps facilitate better question adaptation using specialized question encoders. We hope our work can shed light on the challenges in creating a robust, universal dense retriever that works well across different input distributions. 1 * The first two authors contributed equally. 1 Our dataset and code are publicly available at https:// github.com/princeton-nlp/EntityQuestions.

show abstract

Section: Ethical Considerationsmentioning

confidence: 99%

Simple Entity-Centric Questions Challenge Dense Retrievers

Sciavolino¹,

Zhong²,

Lee³

et al. 2021

Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

View full text Add to dashboard Cite

show abstract

“…For contextualized embeddings, similar methods to alleviate the issue of undesirable biases and toxicity have been proposed (Dev et al, 2020;Nangia et al, 2020;Nadeem et al, 2020;Krause et al, 2020;Kaneko and Bollegala, 2021a). For text generation, Gehman et al (2020) propose domain-adaptive pretraining on non-toxic corpora as outlined by Gururangan et al (2020) and consider plug and play language models (Dathathri et al, 2020).…”

Section: Related Workmentioning

confidence: 99%

Self-Diagnosis and Self-Debiasing: A Proposal for Reducing Corpus-Based Bias in NLP

Schick¹,

Udupa²,

Schütze³

2021

Transactions of the Association for Computational Linguistics

110

View full text Add to dashboard Cite

⚠ This paper contains prompts and model outputs that are offensive in nature. When trained on large, unfiltered crawls from the Internet, language models pick up and reproduce all kinds of undesirable biases that can be found in the data: They often generate racist, sexist, violent, or otherwise toxic language. As large models require millions of training examples to achieve good performance, it is difficult to completely prevent them from being exposed to such content. In this paper, we first demonstrate a surprising finding: Pretrained language models recognize, to a considerable degree, their undesirable biases and the toxicity of the content they produce. We refer to this capability as self-diagnosis. Based on this finding, we then propose a decoding algorithm that, given only a textual description of the undesired behavior, reduces the probability of a language model producing problematic text. We refer to this approach as self-debiasing. Self-debiasing does not rely on manually curated word lists, nor does it require any training data or changes to the model’s parameters. While we by no means eliminate the issue of language models generating biased text, we believe our approach to be an important step in this direction.1

show abstract

“…One approach to examining the behaviour of language models like BERT is to examine how they rank certain representative examples above others. We use two contemporary datasets that measure how often stereotypes are ranked above antistereotypes -StereoSet (Nadeem et al, 2020) and CrowS-Pairs (Nangia et al, 2020). Both datasets measure ss = 100 StereoSet Nadeem et al (2020) propose a benchmark that contains intra-sentence and intersentence examples of stereotypes and antistereotypes.…”

Section: Likelihood-base Diagnosticsmentioning

confidence: 99%

“…Our findings demonstrate that model diagnostics can be unreliable on multiple fronts. To illustrate our point, we select three diagnostics tasks -StereoSet (Nadeem et al, 2020), CrowS-Pairs (Nangia et al, 2020), and SEATs (May et al, 2019) to base our empirical evaluation on. Overall, we find that likelihood-based and representationbased diagnostics measured multiple times on the same training setup can result in wildly different findings.…”

Section: Introductionmentioning

confidence: 99%

How Reliable are Model Diagnostics?

Aribandi

Tay

Metzler

2021

Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

View full text Add to dashboard Cite

In the pursuit of a deeper understanding of a model's behaviour, there is recent impetus for developing suites of probes aimed at diagnosing models beyond simple metrics like accuracy or BLEU. This paper takes a step back and asks an important and timely question: how reliable are these diagnostics in providing insight into models and training setups? We critically examine three recent diagnostic tests for pre-trained language models, and find that likelihood-based and representation-based model diagnostics are not yet as reliable as previously assumed. Based on our empirical findings, we also formulate recommendations for practitioners and researchers.

show abstract

StereoSet: Measuring stereotypical bias in pretrained language models

Cited by 262 publications

References 26 publications

Simple Entity-Centric Questions Challenge Dense Retrievers

Simple Entity-Centric Questions Challenge Dense Retrievers

Self-Diagnosis and Self-Debiasing: A Proposal for Reducing Corpus-Based Bias in NLP

How Reliable are Model Diagnostics?

Contact Info

Product

Resources

About