YouTube Movie Reviews: Sentiment Analysis in an Audio-Visual Context

Wöllmer, Martin; Weninger, Felix; Knaup, Tobias; Schuller, Björn; Sun, Congkai; Sagae, Kenji; Morency, L-P

doi:10.1109/mis.2013.34

Cited by 317 publications

(169 citation statements)

References 6 publications

Supporting

Mentioning

156

Contrasting

Unclassified

Order By: Relevance

“…While recent work has started to study online video, it has been either in the passive viewer case as in [38] (that analyzed observers of video advertising who essentially do not talk), or has used limited facial expression cues (smiles only) in the context of online video reviews (not addressing the personality inference task) [52] [40]. In contrast to these works, our work studies a much richer set of facial expression cues derived from all the basic facial expressions as estimated by a FACs-based recognizer.…”

Section: Analyzing Personality Impressionsmentioning

confidence: 99%

What Your Face Vlogs About: Expressions of Emotion and Big-Five Traits Impressions in YouTube

Teijeiro-Mosquera

Biel

Alba-Castro

et al. 2015

IEEE Trans. Affective Comput.

View full text Add to dashboard Cite

Abstract-Social video sites where people share their opinions and feelings are increasing in popularity. The face is known to reveal important aspects of human psychological traits, so the understanding of how facial expressions relate to personal constructs is a relevant problem in social media. We present a study of the connections between automatically extracted facial expressions of emotion and impressions of Big-Five personality traits in YouTube vlogs (i.e., video blogs). We use the Computer Expression Recognition Toolbox (CERT) system to characterize users of conversational vlogs. From CERT temporal signals corresponding to instantaneously recognized facial expression categories, we propose and derive four sets of behavioral cues that characterize face statistics and dynamics in a compact way. The cue sets are first used in a correlation analysis to assess the relevance of each facial expression of emotion with respect to Big-Five impressions obtained from crowd-observers watching vlogs, and also as features for automatic personality impression prediction. Using a dataset of 281 vloggers, the study shows that while multiple facial expression cues have significant correlation with several of the Big-Five traits, they are only able to significantly predict Extraversion impressions with moderate values of R 2 .

show abstract

Section: Analyzing Personality Impressionsmentioning

confidence: 99%

What Your Face Vlogs About: Expressions of Emotion and Big-Five Traits Impressions in YouTube

Teijeiro-Mosquera

Biel

Alba-Castro

et al. 2015

IEEE Trans. Affective Comput.

View full text Add to dashboard Cite

show abstract

“…As future work there are several avenues likely to improve on our results. Beyond late fusion, other ways to combine prosodic and lexical similarity should be tried (Wollmer et al, 2013;Bruni et al, 2014). For example, recent developments in vector space representations of words (Turian et al, 2010;Erk, 2012;Mikolov et al, 2013;Huang et al, 2013), suggest that it could be productive to build a unified lexico-prosodic vector-space model of both meaning and dialog activity.…”

Section: Discussionmentioning

confidence: 99%

“…Most research relating to using prosody for audio search has focused on detecting dialog activities that people might like to search for. Prosody-based classifiers can, for example, spot interactional "hotspots" where the speakers are unusually involved (Wrede and Shriberg, 2003;Oertel et al, 2011), conflicts (Kim et al, 2012), agreements on action items (Purver et al, 2007), various emotional and attitudinal states and stances (Toivanen and Seppänen, 2002;Wollmer et al, 2013), and dialog acts such as question, apology, promise, and persuasion attempt (Larson et al, 2011;Freedman et al, 2011). This work has shown many dialog activities are indeed associated with characteristic prosodic features and patterns.…”

Section: Background: Prosody For Search In Speechmentioning

confidence: 99%

A prosody-based vector-space model of dialog activity for information retrieval

Ward

Werner

García

et al. 2015

Speech Communication

View full text Add to dashboard Cite

Search in audio archives is a challenging problem. Using prosodic information to help find relevant content has been proposed as a complement to word-based retrieval, but its utility has been an open question. We propose a new way to use prosodic information in search, based on a vector-space model, where each point in time maps to a point in a vector space whose dimensions are derived from numerous prosodic features of the local context. Point pairs that are close in this vector space are frequently similar, not only in terms of the dialog activities, but also in topic. Using proximity in this space as an indicator of similarity, we built support for a query-by-example function. Searchers were happy to use this function, and it provided value on a large testset. Prosody-based retrieval did not perform as well as word-based retrieval, but the two sources of information were often non-redundant and in combination they sometimes performed better than either separately.

show abstract

“…CNN trained on faces) features; for instance, recurrent neural network (RNN) and 3D convolutional networks (C3D), speci cally trained on faces, have been combined with audio features by Fan et al [12]. Wöllmer et al [33], try to understand the speaker's sentiment in on-line videos containing movie reviews by leveraging acoustic, visual and linguistic features. Rosas et al [24] use a similar approach to classify the speaker's emotion in Spanish videos.…”

Section: Related Workmentioning

confidence: 99%

“…Several works have proposed features for recognizing emotion in faces (e.g. [7,33]), but optimal features for this task are still unclear.…”

Section: Introductionmentioning

confidence: 99%

Deep Sentiment Features of Context and Faces for Affective Video Analysis

Baecchi

Uricchio

Bertini

et al. 2017

Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval

View full text Add to dashboard Cite

Given the huge quantity of hours of video available on video sharing platforms such as YouTube, Vimeo, etc. development of automatic tools that help users nd videos that t their interests has attracted the attention of both scienti c and industrial communities. So far the majority of the works have addressed semantic analysis, to identify objects, scenes and events depicted in videos, but more recently a ective analysis of videos has started to gain more attention. In this work we investigate the use of sentiment driven features to classify the induced sentiment of a video, i.e. the sentiment reaction of the user. Instead of using standard computer vision features such as CNN features or SIFT features trained to recognize objects and scenes, we exploit sentiment related features such as the ones provided by , and features extracted from models that exploit deep networks trained on face expressions. We experiment on two recently introduced datasets: LIRIS-ACCEDE [2] and MEDIAEVAL-2015, that provide sentiment annotations of a large set of short videos. We show that our approach not only outperforms the current state-of-the-art in terms of valence and arousal classi cation accuracy, but it also uses a smaller number of features, requiring thus less video processing.

show abstract

YouTube Movie Reviews: Sentiment Analysis in an Audio-Visual Context

Cited by 317 publications

References 6 publications

What Your Face Vlogs About: Expressions of Emotion and Big-Five Traits Impressions in YouTube

What Your Face Vlogs About: Expressions of Emotion and Big-Five Traits Impressions in YouTube

A prosody-based vector-space model of dialog activity for information retrieval

Deep Sentiment Features of Context and Faces for Affective Video Analysis

Contact Info

Product

Resources

About