Research in affective computing requires ground truth data for training and benchmarking computational models for machine-based emotion understanding. In this paper, we propose a large video database, namely LIRIS-ACCEDE, for affective content analysis and related applications, including video indexing, summarization or browsing. In contrast to existing datasets with very few video resources and limited accessibility due to copyright constraints, LIRIS-ACCEDE consists of 9,800 good quality video excerpts with a large content diversity. All excerpts are shared under Creative Commons licenses and can thus be freely distributed without copyright issues. Affective annotations were achieved using crowdsourcing through a pair-wise video comparison protocol, thereby ensuring that annotations are fully consistent, as testified by a high inter-annotator agreement, despite the large diversity of raters' cultural backgrounds. In addition, to enable fair comparison and landmark progresses of future affective computational models, we further provide four experimental protocols and a baseline for prediction of emotions using a large set of both visual and audio features. The dataset (the video clips, annotations, features and protocols) is publicly available at: http://liris-accede.ec-lyon.fr/.
Recently, mainly due to the advances of deep learning, the performances in scene and object recognition have been progressing intensively. On the other hand, more subjective recognition tasks, such as emotion prediction, stagnate at moderate levels. In such context, is it possible to make affective computational models benefit from the breakthroughs in deep learning? This paper proposes to introduce the strength of deep learning in the context of emotion prediction in videos. The two main contributions are as follow: (i) a new dataset, composed of 30 movies under Creative Commons licenses, continuously annotated along the induced valence and arousal axes (publicly available) is introduced, for which (ii) the performance of the Convolutional Neural Networks (CNN) through supervised finetuning, the Support Vector Machines for Regression (SVR) and the combination of both (Transfer Learning) are computed and discussed. To the best of our knowledge, it is the first approach in the literature using CNNs to predict dimensional affective scores from videos. The experimental results show that the limited size of the dataset prevents the learning or finetuning of CNNbased frameworks but that transfer learning is a promising solution to improve the performance of affective movie content analysis frameworks as long as very large datasets annotated along affective dimensions are not available.
International audienceIn our present society, the cinema has become one of the major forms of entertainment providing unlimited contexts of emotion elicitation for the emotional needs of human beings. Since emotions are universal and shape all aspects of our interpersonal and intellectual experience, they have proved to be a highly multidisciplinary research field, ranging from psychology, sociology, neuroscience, etc., to computer science. However, affective multimedia content analysis work from the computer science community benefits but little from the progress achieved in other research fields. In this paper, a multidisciplinary state-of-the-art for affective movie content analysis is given, in order to promote and encourage exchanges between researchers from a very wide range of fields. In contrast to other state-of-the-art papers on affective video content analysis, this work confronts the ideas and models of psychology, sociology, neuroscience, and computer science. The concepts of aesthetic emotions and emotion induction, as well as the different representations of emotions are introduced, based on psychological and sociological theories. Previous global and continuous affective video content analysis work, including video emotion recognition and violence detection, are also presented in order to point out the limitations of affective video content analysis work
International audienceTo contribute to the need for emotional databases and affective tagging, the LIRIS-ACCEDE is proposed in this paper. LIRIS-ACCEDE is an Annotated Creative Commons Emotional DatabasE composed of 9800 video clips extracted from 160 movies shared under Creative Commons licenses. It allows to make this database publicly available1 without copyright issues. The 9800 video clips (each 8-12 seconds long) are sorted along the induced valence axis, from the video perceived the most negatively to the video perceived the most positively. The annotation was carried out by 1518 annotators from 89 different countries using crowdsourcing. A baseline late fusion scheme using ground truth from annotations is computed to predict emotion categories in video clips
ABSTRACT3D processing techniques are really promising. However, several hurdles have to be overcome. In this paper, two of them are examined. The first is related to the high disparity management. It is currently not well mastered and its impact is strong for viewing 3D scene on stereoscopic screens. The second concerns the salient regions of the scene. These areas are commonly called Region-Of-Interest (RoI) in the image processing domain. The problem appears when there are more than one region-of-interest in a video scene. Indeed, it is then complicated for the eyes to scan them and especially if the depth difference between them is high. In this contribution, the 3D experience is improved by applying some effects related to RoIs. The shift between the two views is adaptively adjusted in order to have a null disparity on a given area in the scene. In the proposed approach, these areas are the visually interesting areas. A constant disparity on the salient areas improves the viewing experience over the video sequence.
The objective of colour mapping or colour transfer methods is to recolour a given image or video by deriving a mapping between that image and another image serving as a reference. These methods have received considerable attention in recent years, both in academic literature and industrial applications. Methods for recolouring images have often appeared under the labels of colour correction, colour transfer or colour balancing, to name a few, but their goal is always the same: mapping the colours of one image to another. In this paper, we present a comprehensive overview of these methods and offer a classification of current solutions depending not only on their algorithmic formulation but also their range of applications. We also provide a new dataset and a novel evaluation technique called 'evaluation by colour mapping roundtrip'. We discuss the relative merit of each class of techniques through examples and show how colour mapping solutions can have been applied to a diverse range of problems.
The focus of this paper is automatic color harmonization, which amounts to re-coloring an image so that the obtained color palette is more harmonious for human observers. The proposed automatic algorithm builds on the pioneering works described in [3,12] where templates of harmonious colors are defined on the hue wheel. We bring three contributions in this paper: first, saliency [9] is used to predict the most attractive visual areas and estimate a consistent harmonious template. Second, an efficient color segmentation algorithm, adapted from [4], is proposed to perform consistent color mapping. Third, a new mapping function substitutes usual color shifting method. Results show that the method limits the visual artifacts of state-of-the-art methods and leads to a visually consistent harmonization.
The modeling of the human visual attention into a computational attention model leads to the split of visual features into several independent channels. Then, a difficult problem arises to combine these maps, having different dynamic ranges or distribution. When several maps are considered, such process is mandatory in order to compute a single measure of interest for each location, regardless of which features contributed to the salience. Several strategies of cue combination are proposed in this paper for the spatial cues as well as the temporal saliency. Finally, some user tests on still image and video databases leads to highlight one operator.Index Terms-Visual attention, computational model, map fusion, user experiments, eye-tracker.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.