StereoSet: Measuring stereotypical bias in pretrained language models

Nadeem, Moin; Bethke, Anna; Reddy, Siva

doi:10.48550/arxiv.2004.09456

Cited by 75 publications

(129 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Whether using such unlabeled data, as we do in this work, can help with bias is still an open question. Previous work suggests that training on large amounts of data alone is not sufficient to avoid unwanted biases, since many papers have pointed out biases in large language models (Abid et al, 2021;Nadeem et al, 2020;Gehman et al, 2020). However, recent work has also suggested that pre-trained models can be trained to be more robust against some types of spurious correlations (Hendrycks et al, 2020;Tu et al, 2020) and that additional domain-and task-specific pre-training can also improve performance.…”

Section: A7 Civilcomments-wildsmentioning

confidence: 99%

Extending the WILDS Benchmark for Unsupervised Adaptation

Sagawa¹,

Koh²,

Lee³

et al. 2021

Preprint

View full text Add to dashboard Cite

Machine learning systems deployed in the wild are often trained on a source distribution but deployed on a different target distribution. Unlabeled data can be a powerful point of leverage for mitigating these distribution shifts, as it is frequently much more available than labeled data. However, existing distribution shift benchmarks for unlabeled data do not reflect the breadth of scenarios that arise in real-world applications. In this work, we present the Wilds 2.0 update, which extends 8 of the 10 datasets in the Wilds benchmark of distribution shifts to include curated unlabeled data that would be realistically obtainable in deployment. To maintain consistency, the labeled training, validation, and test sets, as well as the evaluation metrics, are exactly the same as in the original Wilds benchmark. These datasets span a wide range of applications (from histology to wildlife conservation), tasks (classification, regression, and detection), and modalities (photos, satellite images, microscope slides, text, molecular graphs). We systematically benchmark state-of-the-art methods that leverage unlabeled data, including domain-invariant, self-training, and self-supervised methods, and show that their success on Wilds is limited. To facilitate method development and evaluation, we provide an open-source package that automates data loading and contains all of the model architectures and methods used in this paper. Code and leaderboards are available at https://wilds.stanford.edu.

show abstract

Section: A7 Civilcomments-wildsmentioning

confidence: 99%

Extending the WILDS Benchmark for Unsupervised Adaptation

Sagawa¹,

Koh²,

Lee³

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…Large language models have achieved impressive results on many tasks; however, there is also significant evidence demonstrating that they are prone to biases [4], [38], [39]. Debiasing these models remains largely an open problem: most in-processing algorithms are not applicable or computationally prohibitive due to large and highly complex model architectures, and challenges in handling text inputs.…”

Section: Post-processing For Debiasing Large Language Modelsmentioning

confidence: 99%

Post-processing for Individual Fairness

Petersen¹,

Mukherjee²,

Sun³

et al. 2021

Preprint

View full text Add to dashboard Cite

Post-processing in algorithmic fairness is a versatile approach for correcting bias in ML systems that are already used in production. The main appeal of postprocessing is that it avoids expensive retraining. In this work, we propose general post-processing algorithms for individual fairness (IF). We consider a setting where the learner only has access to the predictions of the original model and a similarity graph between individuals, guiding the desired fairness constraints. We cast the IF post-processing problem as a graph smoothing problem corresponding to graph Laplacian regularization that preserves the desired "treat similar individuals similarly" interpretation. Our theoretical results demonstrate the connection of the new objective function to a local relaxation of the original individual fairness. Empirically, our post-processing algorithms correct individual biases in large-scale NLP models such as BERT, while preserving accuracy.

show abstract

“…However, pretrained LMs are well-known for exhibiting unintended social biases involving race, gender, or religion [28,31,42]. These biases result from unfair allocation of resources [20,51], stereotyping that propagates negative generalizations about particular social groups [35], as well as differences in system performance for different social groups, text that misrepresents the distribution of different social groups in the population, or language that is denigrating to particular social groups [4,18,28]. Moreover, these biases may also be exacerbated by biases used for domain-specific LM fine-tuning used for downstream tasks [22,35].…”

Section: Introductionmentioning

confidence: 99%

Unintended Bias in Language Model-driven Conversational Recommendation

Shen¹,

Li²,

Bouadjenek³

et al. 2022

Preprint

View full text Add to dashboard Cite

Conversational Recommendation Systems (CRSs) have recently started to leverage pretrained language models (LM) such as BERT for their ability to semantically interpret a wide range of preference statement variations. However, pretrained LMs are well-known to be prone to intrinsic biases in their training data, which may be exacerbated by biases embedded in domain-specific language data (e.g., user reviews) used to fine-tune LMs for CRSs. We study a recently introduced LM-driven recommendation backbone (termed LMRec) of a CRS to investigate how unintended bias -i.e., language variations such as name references or indirect indicators of sexual orientation or location that should not affect recommendations -manifests in significantly shifted price and category distributions of restaurant recommendations. The alarming results we observe strongly indicate that LMRec has learned to reinforce harmful stereotypes through its recommendations. For example, offhand mention of names associated with the black community significantly lowers the price distribution of recommended restaurants, while offhand mentions of common male-associated names lead to an increase in recommended alcohol-serving establishments. These and many related results presented in this work raise a red flag that advances in the language handling capability of LM-driven CRSs do not come without significant challenges related to mitigating unintended bias in future deployed CRS assistants with a potential reach of hundreds of millions of end users.

show abstract

StereoSet: Measuring stereotypical bias in pretrained language models

Cited by 75 publications

References 0 publications

Extending the WILDS Benchmark for Unsupervised Adaptation

Extending the WILDS Benchmark for Unsupervised Adaptation

Post-processing for Individual Fairness

Unintended Bias in Language Model-driven Conversational Recommendation

Contact Info

Product

Resources

About