Do Neural Language Models Overcome Reporting Bias?

Shwartz, Vered; Choi, Yejin

doi:10.18653/v1/2020.coling-main.605

Cited by 38 publications

(36 citation statements)

References 23 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Studying inconsistencies of PLM-KBs can also teach us about the organization of knowledge in the model, or lack thereof. Finally, failure to behave consistently may point to other representational issues such as the similarity between antonyms and synonyms (Nguyen et al, 2016), and overestimating events and actions (reporting bias) (Shwartz and Choi, 2020).…”

Section: Introductionmentioning

confidence: 99%

Measuring and Improving Consistency in Pretrained Language Models

Elazar

Kassner²,

Ravfogel

et al. 2021

Transactions of the Association for Computational Linguistics

View full text Add to dashboard Cite

Consistency of a model—that is, the invariance of its behavior under meaning-preserving alternations in its input—is a highly desirable property in natural language processing. In this paper we study the question: Are Pretrained Language Models (PLMs) consistent with respect to factual knowledge? To this end, we create ParaRel🤘, a high-quality resource of cloze-style query English paraphrases. It contains a total of 328 paraphrases for 38 relations. Using ParaRel🤘, we show that the consistency of all PLMs we experiment with is poor— though with high variance between relations. Our analysis of the representational spaces of PLMs suggests that they have a poor structure and are currently not suitable for representing knowledge robustly. Finally, we propose a method for improving model consistency and experimentally demonstrate its effectiveness.1

show abstract

Section: Introductionmentioning

confidence: 99%

Measuring and Improving Consistency in Pretrained Language Models

Elazar

Kassner²,

Ravfogel

et al. 2021

Transactions of the Association for Computational Linguistics

View full text Add to dashboard Cite

show abstract

“…Thus, previous research into LMs as knowledge bases has not been able to fully explore the extent to which they know color (A. Rodriguez and Merlo, 2020;Shwartz and Choi, 2020).…”

Section: Related Workmentioning

confidence: 99%

“…Gordon and Van Durme (2013) perform a quantitative analysis using n-gram frequencies from text, finding this phenomenon particularly relevant to internet text corpora. Shwartz and Choi (2020) extend these experiments to pretrained models such as Bert (Devlin et al, 2019) and RoBERTa (Liu et al, 2019). Similar to our work, they analyze color attribution of the form "The banana is tasty."…”

Section: Introductionmentioning

confidence: 98%

“…For example, while most people agree that bananas are typically yellow, the bi-gram "green banana" is 332% more frequent in the Google Books Ngram Corpus (Lin et al, 2012) than "yellow banana". 2 This reporting bias inevitably propagates from corpora to the models trained on them (Shwartz and Choi, 2020) and affects a variety of concepts. One such concept that we would expect to be harmful in downstream applications, is easy to measure, and is solvable via visual input is color.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

The World of an Octopus: How Reporting Bias Influences a Language Model’s Perception of Color

Paik¹,

Aroca-Ouellette²,

Roncone³

et al. 2021

Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

View full text Add to dashboard Cite

Recent work has raised concerns about the inherent limitations of text-only pretraining. In this paper, we first demonstrate that reporting bias, the tendency of people to not state the obvious, is one of the causes of this limitation, and then investigate to what extent multimodal training can mitigate this issue. To accomplish this, we 1) generate the Color Dataset (CoDa), a dataset of human-perceived color distributions for 521 common objects; 2) use CoDa to analyze and compare the color distribution found in text, the distribution captured by language models, and a human's perception of color; and 3) investigate the performance differences between text-only and multimodal models on CoDa. Our results show that the distribution of colors that a language model recovers correlates more strongly with the inaccurate distribution found in text than with the ground-truth, supporting the claim that reporting bias negatively impacts and inherently limits text-only training. We then demonstrate that multimodal models can leverage their visual training to mitigate these effects, providing a promising avenue for future research. * *Email has no accent, but includes the hyphen. 1 In this paper, we use LM to refer to both causal LMs as well as masked LMs.

show abstract

“…Studies have demonstrated that such bias is often manifested in pretrained language models [Sun et al, 2019]. These models further tend to exaggerate patterns of stereotypes in the underlying training data and thus amplify existing biases [Zhao et al, 2017;Shwartz and Choi, 2020]. This is particularly problematic if they are used as a starting point for other models, which likely adopt the biases.…”

Section: Introductionmentioning

confidence: 99%

Bias Silhouette Analysis: Towards Assessing the Quality of Bias Metrics for Word Embedding Models

Spliethöver

Wachsmuth

2021

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence

View full text Add to dashboard Cite

Word embedding models reflect bias towards genders, ethnicities, and other social groups present in the underlying training data. Metrics such as ECT, RNSB, and WEAT quantify bias in these models based on predefined word lists representing social groups and bias-conveying concepts. How suitable these lists actually are to reveal bias - let alone the bias metrics in general - remains unclear, though. In this paper, we study how to assess the quality of bias metrics for word embedding models. In particular, we present a generic method, Bias Silhouette Analysis (BSA), that quantifies the accuracy and robustness of such a metric and of the word lists used. Given a biased and an unbiased reference embedding model, BSA applies the metric systematically for several subsets of the lists to the models. The variance and rate of convergence of the bias values of each model then entail the robustness of the word lists, whereas the distance between the models' values gives indications of the general accuracy of the metric with the word lists. We demonstrate the behavior of BSA on two standard embedding models for the three mentioned metrics with several word lists from existing research.

show abstract

Do Neural Language Models Overcome Reporting Bias?

Cited by 38 publications

References 23 publications

Measuring and Improving Consistency in Pretrained Language Models

Measuring and Improving Consistency in Pretrained Language Models

The World of an Octopus: How Reporting Bias Influences a Language Model’s Perception of Color

Bias Silhouette Analysis: Towards Assessing the Quality of Bias Metrics for Word Embedding Models

Contact Info

Product

Resources

About