Measuring Bias in Contextualized Word Representations

Kurita, Keita; Vyas, Nidhi; Pareek, Ayush; Black, Alan W.; Tsvetkov, Yulia

doi:10.18653/v1/w19-3823

Cited by 280 publications

(321 citation statements)

References 25 publications

(38 reference statements)

Supporting

Mentioning

275

Contrasting

Unclassified

Order By: Relevance

“…To the best of our knowledge, this is the first work to target affective dimensions in pre-trained contextualized word embeddings. Our findings are consistent with prior work suggesting that contextualized embeddings capture biases from training data (Zhao et al, 2019;Kurita et al, 2019) and that these models perform best when trained on in-domain data (Alsentzer et al, 2019).…”

Section: Related Worksupporting

confidence: 91%

Entity-Centric Contextual Affective Analysis

Field

Tsvetkov

2019

Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Self Cite

View full text Add to dashboard Cite

While contextualized word representations have improved state-of-the-art benchmarks in many NLP tasks, their potential usefulness for social-oriented tasks remains largely unexplored. We show how contextualized word embeddings can be used to capture affect dimensions in portrayals of people. We evaluate our methodology quantitatively, on held-out affect lexicons, and qualitatively, through case examples. We find that contextualized word representations do encode meaningful affect information, but they are heavily biased towards their training data, which limits their usefulness to in-domain analyses. We ultimately use our method to examine differences in portrayals of men and women.

show abstract

Section: Related Worksupporting

confidence: 91%

Entity-Centric Contextual Affective Analysis

Field

Tsvetkov

2019

Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Self Cite

View full text Add to dashboard Cite

show abstract

“…In this work, we examine the demographic parity, equality of opportunity for the positive class, and equality of opportunity for the negative class. 1 First, we demonstrate that there are significant differences in the log probability bias scores [36] of clinical text for different genders. These scores examine the probability of filling in the gender demographics given medical context.…”

Section: Introductionmentioning

confidence: 80%

“…debiasing) word embeddings. While individual works study how contextual word embeddings capture biases [5,36,62], to date the creation of debiasing methods has been limited to non-contextual word embeddings models (e.g. GLoVe [49], Word2Vec [45]).…”

Section: Fairness Of Word Embeddingsmentioning

confidence: 99%

“…We use log probability bias scores, a previously proposed method for evaluating evaluating biases in contextual language models [36], which measures prior-adjusted likelihoods of predicting a given word for a fill-in-the-blanks task (Algorithm 1). For each topic, a set of template sentences are prepared, accounting for the various short-hands that different clinicians might adopt.…”

Section: Evaluation Of Log Probability Scorementioning

confidence: 99%

“…While these techniques have high predictive ability, they can capture relationships between words that reflect biases, e.g., along gender [9] or ethnic identity [21]. While such biases exist within both traditional non-contextual word embeddings [9,73] and contextual word embeddings [36,62,72], biases in pretrained contextual embedding models can additionally encode historical biases in the training corpora, class imbalance in datasets, and data quality differences [53,74]. This is especially concerning with the growth of large pretrained contextual embeddings models, such as variants of the BERT architecture [16,40,50,52,67].…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Hurtful words

Zhang

Abdalla

et al. 2020

Proceedings of the ACM Conference on Health, Inference, and Learning

View full text Add to dashboard Cite

In this work, we examine the extent to which embeddings may encode marginalized populations differently, and how this may lead to a perpetuation of biases and worsened performance on clinical tasks. We pretrain deep embedding models (BERT) on medical notes from the MIMIC-III hospital dataset, and quantify potential disparities using two approaches. First, we identify dangerous latent relationships that are captured by the contextual word embeddings using a fill-in-the-blank method with text from real clinical notes and a log probability bias score quantification. Second, we evaluate performance gaps across different definitions of fairness on over 50 downstream clinical prediction tasks that include detection of acute and chronic conditions. We find that classifiers trained from BERT representations exhibit statistically significant differences in performance, often favoring the majority group with regards to gender, language, ethnicity, and insurance status. Finally, we explore shortcomings of using adversarial debiasing to obfuscate subgroup information in contextual word embeddings, and recommend best practices for such deep embedding models in clinical settings.

show abstract

An Optimized NL2SQL System for Enterprise Data Mart

Dong

Xia

et al. 2021

Machine Learning and Knowledge Discovery in Databases. Applied Data Science Track

View full text Add to dashboard Cite

Measuring Bias in Contextualized Word Representations

Cited by 280 publications

References 25 publications

Entity-Centric Contextual Affective Analysis

Entity-Centric Contextual Affective Analysis

Hurtful words

An Optimized NL2SQL System for Enterprise Data Mart

Contact Info

Product

Resources

About