Abstract:An important aspect of human emotion perception is the use of contextual information to understand others' feelings even in situations where their behavior is not very expressive or has an emotionally ambiguous meaning. For technology to successfully detect affect, it must mimic this human ability when analyzing audiovisual input. Databases upon which machine learning algorithms are trained should capture the context of social interactions as well as the behavior expressed in them. However, there is a lack of … Show more
“…Ideally, data should be annotated without the annotation process influencing the (labelled) data, and collected without the collecting itself influencing the data. The former issue was raised in a recent review by Dudzik et al [87], describing how an annotator (perceiver of the emotion data) can be biased in his or her interpretation of other people's emotions. Regarding the latter issue, an example of the ideal situation could be to search for social media messages related to an event, where the collection itself will not influence the content of these messages.…”
This review aims to summarize and describe research on the topic of automatic group emotion recognition. In recent years, the topic of emotion analysis of groups or crowds has gained interest, with studies performing emotion detection in different contexts, using different datasets and modalities (such as images, video, audio, social media messages), and taking different approaches. Articles are included after an innovative search method, including Dense Query Extraction and automatic cross-referencing. Discussed are the types of groups and emotion models considered in automatic emotion recognition research, common datasets for all modalities, general approaches taken, and reported performances. These performances are discussed, followed by an analysis of the application possibilities of the discussed methods. To ensure clear, replicable, and comparable studies, we suggest research should test on multiple, common datasets and report on multiple metrics, when possible. Implementation details and code should be made available where possible. An area of interest for future work is to build systems with more real-world application possibilities, coping with changing group sizes, different emotional subgroups, and changing emotions over time, while having a higher robustness and working with datasets with reduced biases.
“…Ideally, data should be annotated without the annotation process influencing the (labelled) data, and collected without the collecting itself influencing the data. The former issue was raised in a recent review by Dudzik et al [87], describing how an annotator (perceiver of the emotion data) can be biased in his or her interpretation of other people's emotions. Regarding the latter issue, an example of the ideal situation could be to search for social media messages related to an event, where the collection itself will not influence the content of these messages.…”
This review aims to summarize and describe research on the topic of automatic group emotion recognition. In recent years, the topic of emotion analysis of groups or crowds has gained interest, with studies performing emotion detection in different contexts, using different datasets and modalities (such as images, video, audio, social media messages), and taking different approaches. Articles are included after an innovative search method, including Dense Query Extraction and automatic cross-referencing. Discussed are the types of groups and emotion models considered in automatic emotion recognition research, common datasets for all modalities, general approaches taken, and reported performances. These performances are discussed, followed by an analysis of the application possibilities of the discussed methods. To ensure clear, replicable, and comparable studies, we suggest research should test on multiple, common datasets and report on multiple metrics, when possible. Implementation details and code should be made available where possible. An area of interest for future work is to build systems with more real-world application possibilities, coping with changing group sizes, different emotional subgroups, and changing emotions over time, while having a higher robustness and working with datasets with reduced biases.
“…Such information about triggering events has a strong role in interpreting facial behavior [42]. Affective detection work has only tentatively explored this aspect because it is conceptually challenging to translate into automatic systems and generally lacks available corpora for modeling [23].…”
Section: Context In Affect Detectionmentioning
confidence: 99%
“…The insights gained by this act of emotional perspective-taking can complement any information offered by behavior in isolation, thereby enabling an observer to make accurate inferences even for ambiguous cases (e.g., [41]). However, context-sensitive approaches remain under-explored in automatic affect detection [23], despite researchers generally acknowledging their potential [60,66,68]. Likely causes for this neglect are the substantial challenges involved in (1) identifying relevant contextual influences for emotional responses in an application setting, as well as (2) developing technical solutions that provide automatic systems with an awareness of them [29].…”
Section: Introductionmentioning
confidence: 99%
“…Likely causes for this neglect are the substantial challenges involved in (1) identifying relevant contextual influences for emotional responses in an application setting, as well as (2) developing technical solutions that provide automatic systems with an awareness of them [29]. Overcoming these challenges requires systematic exploration of person-and situation-specific influences in computational modeling activities [23] informed by findings from the social sciences [3]. Compared to emotional responses in general, situations in which video stimuli are consumed by an individual provide a more constrained scenario for the exploration of relevant contextual influences.…”
Empirical evidence suggests that the emotional meaning of facial behavior in isolation is often ambiguous in real-world conditions.While humans complement interpretations of others' faces with additional reasoning about context, automated approaches rarely display such context-sensitivity. Empirical findings indicate that the personal memories triggered by videos are crucial for predicting viewers' emotional response to such videos -in some cases, even more so than the video's audiovisual content. In this article, we explore the benefits of personal memories as context for facial behavior analysis. We conduct a series of multimodal machine learning experiments combining the automatic analysis of video-viewers' faces with that of two types of context information for affective predictions: (1) self-reported free-text descriptions of triggered memories and (2) a video's audiovisual content. Our results demonstrate that both sources of context provide models with information about variation in viewers' affective responses that complement facial analysis and each other.CCS Concepts: • Human-centered computing → Empirical studies in ubiquitous and mobile computing.
“…Simply speaking, academia is aiming for "in the wild" data collections, meaning, to process information of people even when they are not aware of it. This entails the use of data enriched with additional metadata such as age, sex, profession, socio-demographic information (Dudzik et al, 2019), or specific personality traits (similar to Big Data studies that use freely available data found on the Internet). We consider this type of data (e.g., sex, age, language, proficiency, personality, etc.)…”
The European Union (EU) General Data Protection Regulations (GDPR) has a direct impact on research activities, as it raises the awareness of personal rights not only among the scientists but also among the data-subjects scientists process information from. This paper presents the dilemma related to the privacy of audio and video data, compliance with the EU GDPR, and techniques to anonymize and pseudonymize such data. We further discuss issues of “in the wild” personal data collection by focusing on multi-modal collections, mainly of audio, video via these channels. Throughout this paper we define relevant core issues and highlight two challenges of “in the wild” data collection: Internet crawling and public data collecting. In the last section, some exemplary use cases are demonstrating the raised issues, illuminating how GDPR affects the collection of publicly available data; how privacy concerns influence participant behavior, and which de-anonymization levels can be reached with what kind of data. The key point we present is that the identity of the participants is revealed in the voice or video signal, while the latter is at the same time the object of the research. One implication is that the research community has to actively disconnect the data from the personal information on the participants. Hence the importance of a process of anonymity or omission of data for research activity. This entail the development of an infrastructure for data access control to enable data sharing among researchers
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.