"I'm sorry to hear that": finding bias in language models with a holistic descriptor dataset

Smith, Eric M.; Hall, Mellisa A.; Kambadur, Melanie; Presani, E.; Williams, Adina

doi:10.48550/arxiv.2205.09209

Cited by 4 publications

(4 citation statements)

References 33 publications

(49 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In open-ended language generation, prompts are often used to assess to what extent LMs yield undesirable output. Various benchmarks such as BOLD [17], HONEST [49], HolisticBias [61] and RealToxicityPrompts [24] exist for this purpose. Choenni et al [13] prompt language models to assess to what extent they have learnt stereotypes.…”

Section: Content Moderation In Language Modelsmentioning

confidence: 99%

Which Stereotypes Are Moderated and Under-Moderated in Search Engine Autocompletion?

Leidinger

Rogers

2023

2023 ACM Conference on Fairness, Accountability, and Transparency

View full text Add to dashboard Cite

Warning: This paper contains content that may be offensive or upsetting.Language technologies that perpetuate stereotypes actively cement social hierarchies. This study enquires into the moderation of stereotypes in autocompletion results by Google, DuckDuckGo and Yahoo! We investigate the moderation of derogatory stereotypes for social groups, examining the content and sentiment of the autocompletions. We thereby demonstrate which categories are highly moderated (i.e., sexual orientation, religious affiliation, political groups and communities or peoples) and which less so (age and gender), both overall and per engine. We found that under-moderated categories contain results with negative sentiment and derogatory stereotypes. We also identify distinctive moderation strategies per engine, with Google and DuckDuckGo moderating greatly and Yahoo! being more permissive. The research has implications for both moderation of stereotypes in commercial autocompletion tools, as well as large language models in NLP, particularly the question of the content deserving of moderation. CCS CONCEPTS• Information systems → Web search engines; • Applied computing → Arts and humanities.

show abstract

Section: Content Moderation In Language Modelsmentioning

confidence: 99%

Which Stereotypes Are Moderated and Under-Moderated in Search Engine Autocompletion?

Leidinger

Rogers

2023

2023 ACM Conference on Fairness, Accountability, and Transparency

View full text Add to dashboard Cite

show abstract

“…An additional study presents a large GBET dataset called HOLISTICBIAS for measuring bias. This dataset is assembled by using a set of demographic descriptor terms in a set of bias measurement templates and can be used to test bias in language models (Smith et al, 2022).…”

Section: Measuring Gender Biasmentioning

confidence: 99%

Measuring Gender Bias in Natural Language Processing: Incorporating Gender-Neutral Linguistic Forms for Non-Binary Gender Identities in Abusive Speech Detection

Sobhani,

Sengupta,

Delany

2023

Proceedings of the Conference Recent Advances in Natural Language Processing - Large Language Models for Natural Language Proce

View full text Add to dashboard Cite

Predictions from Machine Learning models can reflect bias in the data on which they are trained. Gender bias has been shown to be prevalent in Natural Language Processing models. The research into identifying and mitigating gender bias in these models predominantly considers gender as binary, male and female, neglecting the fluidity and continuity of gender as a variable.In this paper, we present an approach to evaluate gender bias in a prediction task, which recognises the non-binary nature of gender. We gender-neutralise a random subset of existing real-world hate speech data. We extend the existing template approach for measuring gender bias to include test examples that are genderneutral. Measuring the bias across a selection of hate speech datasets we show that the bias for the gender-neutral data is closer to that seen for test instances that identify as male than those that identify as female.

show abstract

“…In addition to standard fairness evaluation datasets such as CrowS-Pairs (Nangia et al, 2020) 7 , and template-based fairness 7 Although many of the fairness metrics that are standard in NLP have flaws , we unfortunately have few alternatives. measurements such as the Word Embedding Association Test (WEAT) (Caliskan et al, 2017) and Sentence Encoder Association Test (SEAT), (May et al, 2019)), we also incorporate a new, larger bias measurement dataset, HolisticBias (HB), which was created with a combination of algorithmic and participatory processes to develop the most comprehensive descriptor term list available (Smith et al, 2022). We calculate the per-axis bias by measuring the fraction of pairs of descriptors in the HB dataset for which the distribution of pseudo-loglikelihoods (Nangia et al, 2020) in templated sentences significantly differs.…”

Section: Fairberta Is More Fairmentioning

confidence: 99%

Perturbation Augmentation for Fairer NLP

Qian¹,

Ross²,

Fernandes³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

Unwanted and often harmful social biases are becoming ever more salient in NLP research, affecting both models and datasets. In this work, we ask: does training on demographically perturbed data lead to more fair language models? We collect a large dataset of human annotated text perturbations and train an automatic perturber on it, which we show to outperform heuristic alternatives. We find: (i) Language models (LMs) pre-trained on demographically perturbed corpora are more fair, at least, according to our current best metrics for measuring model fairness, and (ii) LMs finetuned on perturbed GLUE datasets exhibit less demographic bias on downstream tasks. We find that improved fairness does not come at the expense of accuracy. Although our findings appear promising, there are still some limitations, as well as outstanding questions about how best to evaluate the (un)fairness of large language models. We hope that this initial exploration of neural demographic perturbation will help drive more improvement towards fairer NLP.

show abstract

"I'm sorry to hear that": finding bias in language models with a holistic descriptor dataset

Cited by 4 publications

References 33 publications

Which Stereotypes Are Moderated and Under-Moderated in Search Engine Autocompletion?

Which Stereotypes Are Moderated and Under-Moderated in Search Engine Autocompletion?

Measuring Gender Bias in Natural Language Processing: Incorporating Gender-Neutral Linguistic Forms for Non-Binary Gender Identities in Abusive Speech Detection

Perturbation Augmentation for Fairer NLP

Contact Info

Product

Resources

About