Theory-driven text analysis has made extensive use of psychological concept dictionaries, leading to a wide range of important results. These dictionaries have generally been applied through word count methods which have proven to be both simple and effective. In this paper, we introduce Distributed Dictionary Representations (DDR), a method that applies psychological dictionaries using semantic similarity rather than word counts. This allows for the measurement of the similarity between dictionaries and spans of text ranging from complete documents to individual words. We show how DDR enables dictionary authors to place greater emphasis on construct validity without sacrificing linguistic coverage. We further demonstrate the benefits of DDR on two real-world tasks and finally conduct an extensive study of the interaction between dictionary size and task performance. These studies allow us to examine how DDR and word count methods complement one another as tools for applying concept dictionaries and where each is best applied. Finally, we provide references to tools and resources to make this method both available and accessible to a broad psychological audience.
Keywords Methodological innovation · Text analysis · Semantic representation · Dictionary-based text analysisElectronic supplementary material The online version of this article
Recent years have seen rapid developments in automated text analysis methods focused on measuring psychological and demographic properties. While this development has mainly been driven by computer scientists and computational linguists, such methods can be of great value for social scientists in general, and for psychologists in particular. In this paper, we review some of the most popular approaches to automated text analysis from the perspective of social scientists, and give examples of their applications in different theoretical domains. After describing some of the pros and cons of these methods, we speculate about future methodological developments, and how they might change social sciences. We conclude that, despite the fact that current methods have many disadvantages and pitfalls compared to more traditional methods of data collection, the constant increase of computational power and the wide availability of textual data will inevitably make automated text analysis a common tool for psychologists.
When do people see self-control as a moral issue? We hypothesize that the group-focused "binding" moral values of Loyalty/betrayal, Authority/subversion, and Purity/degradation play a particularly important role in this moralization process. Nine studies provide support for this prediction. First, moralization of self-control goals (e.g., losing weight, saving money) is more strongly associated with endorsing binding moral values than with endorsing individualizing moral values (Care/harm, Fairness/cheating). Second, binding moral values mediate the effect of other group-focused predictors of self-control moralization, including conservatism, religiosity, and collectivism. Third, guiding participants to consider morality as centrally about binding moral values increases moralization of self-control more than guiding participants to consider morality as centrally about individualizing moral values. Fourth, we replicate our core finding that moralization of self-control is associated with binding moral values across studies differing in measures and design-whether we measure the relationship between moral and self-control language across time, the perceived moral relevance of self-control behaviors, or the moral condemnation of self-control failures. Taken together, our findings suggest that self-control moralization is primarily group-oriented and is sensitive to group-oriented cues. (PsycINFO Database Record
Hate speech classifiers trained on imbalanced datasets struggle to determine if group identifiers like "gay" or "black" are used in offensive or prejudiced ways. Such biases manifest in false positives when these identifiers are present, due to models' inability to learn the contexts which constitute a hateful usage of identifiers. We extract post-hoc explanations from fine-tuned BERT classifiers to detect bias towards identity terms. Then, we propose a novel regularization technique based on these explanations that encourages models to learn from the context of group identifiers in addition to the identifiers themselves. Our approach improved over baselines in limiting false positives on out-of-domain data while maintaining or improving in-domain performance. † * Authors contributed equally † Code is available here "[F]or many Africans, the most threatening kind of ethnic hatred is black against black." -New York Times
Does sharing moral values encourage people to connect and form communities? The importance of moral homophily (love of same) has been recognized by social scientists, but the types of moral similarities that drive this phenomenon are still unknown. Using both large-scale, observational social-media analyses and behavioral lab experiments, the authors investigated which types of moral similarities influence tie formations. Analysis of a corpus of over 700,000 tweets revealed that the distance between 2 people in a social-network can be predicted based on differences in the moral purity content-but not other moral content-of their messages. The authors replicated this finding by experimentally manipulating perceived moral difference (Study 2) and similarity (Study 3) in the lab and demonstrating that purity differences play a significant role in social distancing. These results indicate that social network processes reflect moral selection, and both online and offline differences in moral purity concerns are particularly predictive of social distance. This research is an attempt to study morality indirectly using an observational big-data study complemented with 2 confirmatory behavioral experiments carried out using traditional social-psychology methodology.
In this paper we present a computational text analysis technique for measuring the moral loading of concepts as they are used in a corpus. This method is especially useful for the study of online corpora as it allows for the rapid analysis of moral rhetoric in texts such as blogs and tweets as events unfold. We use latent semantic analysis to compute the semantic similarity between concepts and moral keywords taken from the “Moral foundation Dictionary”. This measure of semantic similarity represents the loading of these concepts on the five moral dimensions identified by moral foundation theory. We demonstrate the efficacy of this method using three different concepts and corpora.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.