In this paper we introduce the novel problem of understanding visual persuasion. Modern mass media make extensive use of images to persuade people to make commercial and political decisions. These effects and techniques are widely studied in the social sciences, but behavioral studies do not scale to massive datasets. Computer vision has made great strides in building syntactical representations of images, such as detection and identification of objects. However, the pervasive use of images for communicative purposes has been largely ignored. We extend the significant advances in syntactic analysis in computer vision to the higher-level challenge of understanding the underlying communicative intent implied in images. We begin by identifying nine dimensions of persuasive intent latent in images of politicians, such as "socially dominant," "energetic," and "trustworthy," and propose a hierarchical model that builds on the layer of syntactical attributes, such as "smile" and "waving hand," to predict the intents presented in the images. To facilitate progress, we introduce a new dataset of 1,124 images of politicians labeled with ground-truth intents in the form of rankings. This study demonstrates that a systematic focus on visual persuasion opens up the field of computer vision to a new class of investigations around mediated images, intersecting with media analysis, psychology, and political communication.
The portrayal of the actions of fictive characters for purposes of entertainment is a familiar phenomenon. Theories that seek to explain why we are attracted to such fictions and whether we learn from them have produced no consensus and no adequate overall account. In this paper, we present the hypothesis that entertainment relies on cognitive adaptations for pretend play. As a simplified model system, we draw on our field study of children's chase play, which is characterized by an elementary form of pretense. The children pretend, at first without consciously representing their pretense, to be chased by predators. The details of this behavior, widespread among mammals, indicate that the biological function of the game may be to train predator-evasion strategies. Chase play, we suggest, evolved in early mammals because it enabled cheap and plentiful resources to be used to train strategies for events that are rare, dangerous, and expensive. More generally, we argue that pretense is used to access spaces of possible actions in order to locate and practice new strategies. It relies on the creation of a simulated scenario and requires sophisticated source monitoring. The simulation is experienced as intrinsically rewarding; boredom is a design feature to motivate the construction of a more appropriate pedagogical situation, while the thrill of play signals optimal learning conditions. The conscious narrative elaboration of chase games involves an elementary form of role play, where we propose a virtual agent is created that tracks and acts on the memories required for coherent action within the simulation. These complex if familiar design features, we suggest, provide a minimalist functional and adaptationist account of the central features of entertainment: that it is fun, that it involves us imaginatively and emotionally, and that it has a tacit pedagogical effect. The model provides a principled and testable account of fiction-based entertainment grounded in evolutionary and cognitive processes.
Research into the multimodal dimensions of human communication faces a set of distinctive methodological challenges. Collecting the datasets is resource-intensive, analysis often lacks peer validation, and the absence of shared datasets makes it difficult to develop standards. External validity is hampered by small datasets, yet large datasets are intractable. Red Hen Lab spearheads an international infrastructure for data-driven multimodal communication research, facilitating an integrated cross-disciplinary workflow. Linguists, communication scholars, statisticians, and computer scientists work together to develop research questions, annotate training sets, and develop pattern discovery and machine learning tools that handle vast collections of multimodal data, beyond the dreams of previous researchers. This infrastructure makes it possible for researchers at multiple sites to work in real-time in transdisciplinary teams. We review the vision, progress, and prospects of this research consortium.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.