Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Confer 2021
DOI: 10.18653/v1/2021.acl-long.416
|View full text |Cite
|
Sign up to set email alerts
|

StereoSet: Measuring stereotypical bias in pretrained language models

Abstract: A stereotype is an over-generalized belief about a particular group of people, e.g., Asians are good at math or African Americans are athletic. Such beliefs (biases) are known to hurt target groups. Since pretrained language models are trained on large real world data, they are known to capture stereotypical biases. It is important to quantify to what extent these biases are present in them. Although this is a rapidly growing area of research, existing literature lacks in two important aspects: 1) they mainly … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

2
232
1

Year Published

2021
2021
2023
2023

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 262 publications
(338 citation statements)
references
References 26 publications
(32 reference statements)
2
232
1
Order By: Relevance
“…In general, machine learning has the ability to amplify biases presented implicitly and explicitly in the training data. Models that we reference in our study are based on BERT, which has been shown to learn and exacerbate stereotypes during training (e.g., Kurita et al 2019, Tan and Celis 2019, Nadeem et al 2021. We further train these models on Wikidata triples, which again has the potential to amplify harmful and toxic biases.…”
Section: Ethical Considerationsmentioning
confidence: 99%
“…In general, machine learning has the ability to amplify biases presented implicitly and explicitly in the training data. Models that we reference in our study are based on BERT, which has been shown to learn and exacerbate stereotypes during training (e.g., Kurita et al 2019, Tan and Celis 2019, Nadeem et al 2021. We further train these models on Wikidata triples, which again has the potential to amplify harmful and toxic biases.…”
Section: Ethical Considerationsmentioning
confidence: 99%
“…For contextualized embeddings, similar methods to alleviate the issue of undesirable biases and toxicity have been proposed (Dev et al, 2020;Nangia et al, 2020;Nadeem et al, 2020;Krause et al, 2020;Kaneko and Bollegala, 2021a). For text generation, Gehman et al (2020) propose domain-adaptive pretraining on non-toxic corpora as outlined by Gururangan et al (2020) and consider plug and play language models (Dathathri et al, 2020).…”
Section: Related Workmentioning
confidence: 99%
“…One approach to examining the behaviour of language models like BERT is to examine how they rank certain representative examples above others. We use two contemporary datasets that measure how often stereotypes are ranked above antistereotypes -StereoSet (Nadeem et al, 2020) and CrowS-Pairs (Nangia et al, 2020). Both datasets measure ss = 100 StereoSet Nadeem et al (2020) propose a benchmark that contains intra-sentence and intersentence examples of stereotypes and antistereotypes.…”
Section: Likelihood-base Diagnosticsmentioning
confidence: 99%
“…Our findings demonstrate that model diagnostics can be unreliable on multiple fronts. To illustrate our point, we select three diagnostics tasks -StereoSet (Nadeem et al, 2020), CrowS-Pairs (Nangia et al, 2020), and SEATs (May et al, 2019) to base our empirical evaluation on. Overall, we find that likelihood-based and representationbased diagnostics measured multiple times on the same training setup can result in wildly different findings.…”
Section: Introductionmentioning
confidence: 99%