Debiasing Methods in Natural Language Understanding Make Bias More Accessible

Mendelson, Michael; Belinkov, Yonatan

doi:10.18653/v1/2021.emnlp-main.116

Cited by 5 publications

(4 citation statements)

References 25 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Our findings suggest that practitioners of NLP should take special care when adopting previously debiased models and inspect them carefully, perhaps using our framework. Our results differ from those of Mendelson and Belinkov (2021a), who found that the debiasing increases bias extractability as measured by compression rate. However, they studied different, non-social biases, that arise from spurious or unintended correlations in training datasets (often called dataset biases).…”

Section: Discussioncontrasting

confidence: 99%

See 1 more Smart Citation

How Gender Debiasing Affects Internal Model Representations, and Why It Matters

Orgad¹,

Goldfarb-Tarrant²,

Belinkov³

2022

Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Langua

Self Cite

View full text Add to dashboard Cite

Common studies of gender bias in NLP focus either on extrinsic bias measured by model performance on a downstream task or on intrinsic bias found in models' internal representations. However, the relationship between extrinsic and intrinsic bias is relatively unknown. In this work, we illuminate this relationship by measuring both quantities together: we debias a model during downstream fine-tuning, which reduces extrinsic bias, and measure the effect on intrinsic bias, which is operationalized as bias extractability with information-theoretic probing. Through experiments on two tasks and multiple bias metrics, we show that our intrinsic bias metric is a better indicator of debiasing than (a contextual adaptation of) the standard WEAT metric, and can also expose cases of superficial debiasing. Our framework provides a comprehensive perspective on bias in NLP models, which can be applied to deploy NLP systems in a more informed manner. 1 * Supported by the Viterbi Fellowship in the Center for Computer Engineering at the Technion.

show abstract

Section: Discussioncontrasting

confidence: 99%

“…We use the MDL probe (Voita and Titov, 2020) implementation by Mendelson and Belinkov (2021b). In all experiments, we use a linear probe and train it with a batch size of 16 and a learning rate of 1e-3.…”

Section: A3 Probing Classifiermentioning

confidence: 99%

How Gender Debiasing Affects Internal Model Representations, and Why It Matters

Orgad¹,

Goldfarb-Tarrant²,

Belinkov³

2022

Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Langua

Self Cite

View full text Add to dashboard Cite

show abstract

“…Our work is similar, but examines fundamental questions about what models will learn from debiasing procedures. Mendelson and Belinkov (2021) show through a probing experiment that debiasing against a particular bias may increase the extent to which that bias is encoded in the inner representations of models. In this work, we study how debiasing procedures affect model behavior, as probe performance is not necessarily indicative of the information which a model actually uses to make predictive decisions (Ravichander et al, 2021;Elazar et al, 2021).…”

Section: Related Work and Backgroundmentioning

confidence: 98%

When and Why Does Bias Mitigation Work?

Ravichander,

Stacey,

Rei

2023

Findings of the Association for Computational Linguistics: EMNLP 2023

View full text Add to dashboard Cite

Neural models have been shown to exploit shallow surface features to perform language understanding tasks, rather than learning the deeper language understanding and reasoning skills that practitioners desire. Previous work has developed debiasing techniques to pressure models away from 'spurious' features or artifacts in datasets, with the goal of having models instead learn useful, task-relevant representations. However, what do models actually learn as a result of such debiasing procedures? In this work, we evaluate three model debiasing strategies, and through a set of carefully designed tests we show how debiasing can actually increase the model's reliance on hidden biases, instead of learning robust features that help it solve a task. Furthermore, we demonstrate how even debiasing models against all shallow features in a dataset may still not help models address a task. As a result, we suggest that only debiasing existing models may not be sufficient for many language understanding tasks, and future work should consider new learning paradigms to address complex challenges such as commonsense reasoning.

show abstract

“…For instance, a bias model may be trained adversarially, making the main model perform worse when the bias model performs well (Belinkov et al, 2019b;Stacey et al, 2020). Others use a bias model to modulate the main model's predictions in various ways (He et al, 2019;Karimi Mahabadi et al, 2020;Utama et al, 2020b;Sanh et al, 2021;Mendelson and Belinkov, 2021). All these approaches use discriminative models to estimate p(y | P, H).…”

Section: Mitigation Strategiesmentioning

confidence: 99%

Proceedings of the 11th Joint Conference on Lexical and Computational Semantics

2022

View full text Add to dashboard Cite

In recent years, NLP has made what appears to be incredible progress, with performance even surpassing human performance on some benchmarks. How should we interpret these advances? Have these models achieved language "understanding"? Operating on the premise that "understanding" will necessarily involve the capacity to extract and deploy meaning information, in this talk I will discuss a series of projects leveraging targeted tests to examine NLP models' ability to capture meaning in a systematic fashion. I will first discuss work probing model representations for compositional meaning, with a particular focus on disentangling compositional information from encoding of lexical properties. I'll then explore models' ability to extract and use meaning information when executing the basic pretraining task of word prediction in context. In all cases, these investigations apply tests that prioritize control of unwanted cues, so as to target the desired model capabilities with greater precision. The results of these studies suggest that although models show a good deal of sensitivity to word-level information, and to certain semantic and syntactic distinctions, when subjected to controlled tests they show little sign of representing higher-level compositional meaning, or of being able to retain and deploy such information robustly during word prediction. Instead, models show signs of heuristic predictive strategies that are unsurprising given their training, but that differ critically from systematic understanding of meaning. I will discuss potential implications of these findings with respect to the goals of achieving "understanding" with currently dominant pre-training paradigms.Bio: Allyson Ettinger is an Assistant Professor in the Department of Linguistics at the University of Chicago. Her interdisciplinary work combines methods and insights from cognitive science, linguistics, and computer science to examine meaning extraction and predictive processes executed during language processing in artificial intelligence systems and in humans. She received her PhD in Linguistics from the University of Maryland, and spent a year as research faculty at the Toyota Technological Institute at Chicago (TTIC) before beginning her appointment at the University of Chicago. She holds an additional courtesy appointment at TTIC.

show abstract

Debiasing Methods in Natural Language Understanding Make Bias More Accessible

Cited by 5 publications

References 25 publications

How Gender Debiasing Affects Internal Model Representations, and Why It Matters

How Gender Debiasing Affects Internal Model Representations, and Why It Matters

When and Why Does Bias Mitigation Work?

Proceedings of the 11th Joint Conference on Lexical and Computational Semantics

Contact Info

Product

Resources

About