Aida Mostafazadeh Davani scite author profile

Hate speech classifiers trained on imbalanced datasets struggle to determine if group identifiers like "gay" or "black" are used in offensive or prejudiced ways. Such biases manifest in false positives when these identifiers are present, due to models' inability to learn the contexts which constitute a hateful usage of identifiers. We extract post-hoc explanations from fine-tuned BERT classifiers to detect bias towards identity terms. Then, we propose a novel regularization technique based on these explanations that encourages models to learn from the context of group identifiers in addition to the identifiers themselves. Our approach improved over baselines in limiting false positives on out-of-domain data while maintaining or improving in-domain performance. † * Authors contributed equally † Code is available here "[F]or many Africans, the most threatening kind of ethnic hatred is black against black." -New York Times

show abstract

Moral Foundations Twitter Corpus: A Collection of 35k Tweets Annotated for Moral Sentiment

Hoover

Portillo-Wightman

Yeh

et al. 2020

Social Psychological and Personality Science

View full text Add to dashboard Cite

Research has shown that accounting for moral sentiment in natural language can yield insight into a variety of on- and off-line phenomena such as message diffusion, protest dynamics, and social distancing. However, measuring moral sentiment in natural language is challenging, and the difficulty of this task is exacerbated by the limited availability of annotated data. To address this issue, we introduce the Moral Foundations Twitter Corpus, a collection of 35,108 tweets that have been curated from seven distinct domains of discourse and hand annotated by at least three trained annotators for 10 categories of moral sentiment. To facilitate investigations of annotator response dynamics, we also provide psychological and demographic metadata for each annotator. Finally, we report moral sentiment classification baselines for this corpus using a range of popular methodologies.

show abstract

Moral Foundations Twitter Corpus: A collection of 35k tweets annotated for moral sentiment

Hoover¹,

Portillo-Wightman²,

Yeh³

et al. 2019

Preprint

View full text Add to dashboard Cite

Research has shown that accounting for moral sentiment in natural language can yield insight into a variety of on- and off-line phenomena, such as message diffusion, protest dynamics, and social distancing. However, measuring moral sentiment in natural language is challenging and the difficulty of this task is exacerbated by the limited availability of annotated data. To address this issue, we introduce the Moral Foundations Twitter Corpus, a collection of 35,108 tweets that have been curated from seven distinct domains of discourse and hand-annotated by at least three trained annotators for 10 categories of moral sentiment. To facilitate investigations of annotator response dynamics, we also provide psychological and demographic meta-data for each annotator. Finally, we report moral sentiment classification baselines for this corpus using a range of popular methodologies.

show abstract

Dealing with Disagreements: Looking Beyond the Majority Vote in Subjective Annotations

Davani

Díaz

Prabhakaran

2022

View full text Add to dashboard Cite

Majority voting and averaging are common approaches used to resolve annotator disagreements and derive single ground truth labels from multiple annotations. However, annotators may systematically disagree with one another, often reflecting their individual biases and values, especially in the case of subjective tasks such as detecting affect, aggression, and hate speech. Annotator disagreements may capture important nuances in such tasks that are often ignored while aggregating annotations to a single ground truth. In order to address this, we investigate the efficacy of multi-annotator models. In particular, our multi-task based approach treats predicting each annotators’ judgements as separate subtasks, while sharing a common learned representation of the task. We show that this approach yields same or better performance than aggregating labels in the data prior to training across seven different binary classification tasks. Our approach also provides a way to estimate uncertainty in predictions, which we demonstrate better correlate with annotation disagreements than traditional methods. Being able to model uncertainty is especially useful in deployment scenarios where knowing when not to make a prediction is important.

show abstract

Investigating the role of group-based morality in extreme behavioral expressions of prejudice

et al. 2021

View full text Add to dashboard Cite

Understanding motivations underlying acts of hatred are essential for developing strategies to prevent such extreme behavioral expressions of prejudice (EBEPs) against marginalized groups. In this work, we investigate the motivations underlying EBEPs as a function of moral values. Specifically, we propose EBEPs may often be best understood as morally motivated behaviors grounded in people’s moral values and perceptions of moral violations. As evidence, we report five studies that integrate spatial modeling and experimental methods to investigate the relationship between moral values and EBEPs. Our results, from these U.S. based studies, suggest that moral values oriented around group preservation are predictive of the county-level prevalence of hate groups and associated with the belief that extreme behavioral expressions of prejudice against marginalized groups are justified. Additional analyses suggest that the association between group-based moral values and EBEPs against outgroups can be partly explained by the belief that these groups have done something morally wrong.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.