Social Biases in NLP Models as Barriers for Persons with Disabilities

Hutchinson, Ben; Prabhakaran, Vinodkumar; Denton, Emily; Webster, Kellie; Zhong, Yu; Denuyl, Stephen

doi:10.18653/v1/2020.acl-main.487

Cited by 139 publications

(113 citation statements)

References 29 publications

Supporting

Mentioning

112

Contrasting

Unclassified

Order By: Relevance

“…Although widely used, the PERSPECTIVE API and other hate speech detection systems and corpora exhibit biases against minorities and suffer from low agreement in annotations (Waseem, 2016;Ross et al, 2017), partially due to annotator identity influencing their perception of hate speech (Cowan and Khatchadourian, 2003) and differences in annotation task setup (Sap et al, 2019). Notably, recent work has found that systems are overestimating the prevalence of toxicity in text that contains a minority identity mention (e.g., "I'm a gay man"; Hutchinson et al, 2020) or text by racial minorities (e.g., text in African American English; Sap et al, 2019;Davidson et al, 2019). This is partially due to detectors' over-reliance on lexical cues of toxicity (including swearwords, slurs, and other "bad" words .…”

Section: Biases In Toxic Language Detectionmentioning

confidence: 99%

RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models

Gehman

Gururangan

Sap³

et al. 2020

Findings of the Association for Computational Linguistics: EMNLP 2020

155

164

View full text Add to dashboard Cite

Pretrained neural language models (LMs) are prone to generating racist, sexist, or otherwise toxic language which hinders their safe deployment. We investigate the extent to which pretrained LMs can be prompted to generate toxic language, and the effectiveness of controllable text generation algorithms at preventing such toxic degeneration. We create and release RE-ALTOXICITYPROMPTS, a dataset of 100K naturally occurring, sentence-level prompts derived from a large corpus of English web text, paired with toxicity scores from a widelyused toxicity classifier. Using REALTOXICI-TYPROMPTS, we find that pretrained LMs can degenerate into toxic text even from seemingly innocuous prompts. We empirically assess several controllable generation methods, and find that while data-or compute-intensive methods (e.g., adaptive pretraining on non-toxic data) are more effective at steering away from toxicity than simpler solutions (e.g., banning "bad" words), no current method is failsafe against neural toxic degeneration. To pinpoint the potential cause of such persistent toxic degeneration, we analyze two web text corpora used to pretrain several LMs (including GPT-2;Radford et al., 2019), and find a significant amount of offensive, factually unreliable, and otherwise toxic content. Our work provides a test bed for evaluating toxic generations by LMs and stresses the need for better data selection processes for pretraining. 10 Oversampling toxicity is necessary since it is a relatively rare phenomenon online (Founta et al., 2018).

show abstract

Section: Biases In Toxic Language Detectionmentioning

confidence: 99%

RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models

Gehman

Gururangan

Sap³

et al. 2020

Findings of the Association for Computational Linguistics: EMNLP 2020

155

164

View full text Add to dashboard Cite

show abstract

“…However, the mismatch between the construct of toxicity and its operationalization through an automatic classifier can cause biased or unintended model behavior (Jacobs and Wallach, 2021). Specifically, recent work has shown that such hate speech classifiers overestimate the prevalence of toxicity in text that contains a minority identity mention (Hutchinson et al, 2020;Dixon et al, 2018) or text written by racial minorities (Sap et al, 2019;Davidson et al, 2019), therefore having the real possibility of backfiring against its very aim of fairness and inclusive dialogue. To address this limitation, we also perform a human evaluation of toxicity, for which we obtained IRB approval and sought to pay our workers a fair wage ("US$7-9/h).…”

Section: Broader Impact and Ethical Implicationsmentioning

confidence: 99%

DExperts: Decoding-Time Controlled Text Generation with Experts and Anti-Experts

Liu¹,

Sap²,

Lu³

et al. 2021

Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Confer

View full text Add to dashboard Cite

Despite recent advances in natural language generation, it remains challenging to control attributes of generated text. We propose DEX-PERTS: Decoding-time Experts, a decodingtime method for controlled text generation that combines a pretrained language model with "expert" LMs and/or "anti-expert" LMs in a product of experts. Intuitively, under the ensemble, tokens only get high probability if they are considered likely by the experts and unlikely by the anti-experts. We apply DEXPERTS to language detoxification and sentiment-controlled generation, where we outperform existing controllable generation methods on both automatic and human evaluations. Moreover, because DEXPERTS operates only on the output of the pretrained LM, it is effective with (anti-)experts of smaller size, including when operating on GPT-3. Our work highlights the promise of tuning small LMs on text with (un)desirable attributes for efficient decoding-time steering.

show abstract

“…We al have the extensive study by Leavy et al [74] on CBOW trained on articles from The Guardian journal and the British Digital Library. Also, Hutchinson et al [82] studies the perception of models towards disabled people and Bhardwaj et al [78] combines the study of gender bias on BERT by sentiment analysis with gender separability.…”

Section: Association Testsmentioning

confidence: 99%

A Survey on Bias in Deep NLP

Garrido-Muñoz¹,

Montejo-Ráez²,

Martínez-Santiago³

et al. 2021

Preprint

View full text Add to dashboard Cite

Deep neural networks are hegemonic approaches to many machine learning areas, including natural language processing (NLP). Thanks to the availability of large corpora collections and the capability of deep architectures to shape internal language mechanisms in self-supervised learning processes (also known as "pre-training"), versatile and performing models are released continuously for every new network design. But these networks, somehow, learn a probability distribution of words and relations across the training collection used, inheriting the potential flaws, inconsistencies and biases contained in such a collection. As pre-trained models have found to be very useful approaches to transfer learning, dealing with bias has become a relevant issue in this new scenario. We introduce bias in a formal way and explore how it has been treated in several networks, in terms of detection and correction. Also, available resources are identified and a strategy to deal with bias in deep NLP is proposed.

show abstract

Social Biases in NLP Models as Barriers for Persons with Disabilities

Cited by 139 publications

References 29 publications

RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models

RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models

DExperts: Decoding-Time Controlled Text Generation with Experts and Anti-Experts

A Survey on Bias in Deep NLP

Contact Info

Product

Resources

About