COGNIMUSE: a multimodal video database annotated with saliency, events, semantics and emotion with application to summarization

Zlatintsi, Athanasia; Koutras, Petros; Evangelopoulos, Georgios; Malandrakis, Nikolaos; Efthymiou, Niki; Pastra, Katerina; Potamianos, Alexandros; Maragos, Petros

doi:10.1186/s13640-017-0194-1

Cited by 43 publications

(41 citation statements)

References 83 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Fsτ i (13) In the model, excitation and inhibition are determined according to the expressions shown in Equation 14. Firstly, excitation e is calculated on the input a in .…”

Section: Sensory Activation Stagementioning

confidence: 99%

“…Sensory saliency is determined by the enhanced sensitivity or tuning of the human hearing system to specific sound features [12]. On the other hand, semantic saliency requires recognition of the sound and incongruency within the environment [13]. Sensory saliency has been investigated by explicitly identifying features that alter behavior [12] or by inspection of the spectrogram using methods similar to the ones used to model visual saliency [14].…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Auditory sensory saliency as a better predictor of change than sound amplitude in pleasantness assessment of reproduced urban soundscapes

Filipan

Coensel

Aumond

et al. 2019

Building and Environment

View full text Add to dashboard Cite

The sonic environment of the urban public space is often experienced while walking through it. Nevertheless, city dwellers are usually not actively listening to the environment when traversing the city. Therefore, sound events that are salient, i.e. stand out of the sonic environment, are the ones that trigger attention and contribute highly to the perception of the soundscape. In a previously reported audiovisual perception experiment, the pleasantness of a recorded urban sound walk was continuously evaluated by a group of participants. To detect salient events in the soundscape, a biologically-inspired computational model for auditory sensory saliency based on spectrotemporal modulations is proposed. Using the data from a sound walk, the present study validates the hypothesis that salient events detected by the model contribute to changes in soundscape rating and are therefore important when evaluating the urban soundscape. Finally, when using the data from an additional experiment without a strong visual component, the importance of auditory sensory saliency as a predictor for change in pleasantness assessment is found to be even more pronounced.

show abstract

“…Fsτ i (13) In the model, excitation and inhibition are determined according to the expressions shown in Equation 14. Firstly, excitation e is calculated on the input a in .…”

Section: Sensory Activation Stagementioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Auditory sensory saliency as a better predictor of change than sound amplitude in pleasantness assessment of reproduced urban soundscapes

Filipan

Coensel

Aumond

et al. 2019

Building and Environment

View full text Add to dashboard Cite

show abstract

“…COGNINMUSE is a collection of videos annotated with sensory and semantic saliency, events, cross-media semantics, and emotions [178]. A subset of 3.5h extracted from movies, including textual modality, are annotated on arousal and valence.…”

Section: Datasets For Ac Of Multimodal Datamentioning

confidence: 99%

Affective Computing for Large-scale Heterogeneous Multimedia Data

Zhao

Wang

Soleymani

et al. 2019

ACM Trans. Multimedia Comput. Commun. Appl.

View full text Add to dashboard Cite

The wide popularity of digital photography and social networks has generated a rapidly growing volume of multimedia data (i.e., image, music, and video), resulting in a great demand for managing, retrieving, and understanding these data. Affective computing (AC) of these data can help to understand human behaviors and enable wide applications. In this article, we survey the state-of-the-art AC technologies comprehensively for large-scale heterogeneous multimedia data. We begin this survey by introducing the typical emotion representation models from psychology that are widely employed in AC. We briefly describe the available datasets for evaluating AC algorithms. We then summarize and compare the representative methods on AC of different multimedia types, i.e., images, music, videos, and multimodal data, with the focus on both handcrafted features-based methods and deep learning methods. Finally, we discuss some challenges and future directions for multimedia affective computing.

show abstract

“…The most relevant dataset to our tasks is the COGNIMUSE database [70,1], which constitutes a video database annotated with ground-truth annotations for frame-wise sensory and semantic importance as well as audio and visual events. It is a generic database that has been used for video summarization [36], as well as audio-visual concept recognition [7].…”

Section: Datasetsmentioning

confidence: 99%

SUSiNet: See, Understand and Summarize It

Koutras

Maragos

2019

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)

Self Cite

View full text Add to dashboard Cite

In this work we propose a multi-task spatio-temporal network, called SUSiNet, that can jointly tackle the spatiotemporal problems of saliency estimation, action recognition and video summarization. Our approach employs a single network that is jointly end-to-end trained for all tasks with multiple and diverse datasets related to the exploring tasks. The proposed network uses a unified architecture that includes global and task specific layer and produces multiple output types, i.e., saliency maps or classification labels, by employing the same video input. Moreover, one additional contribution is that the proposed network can be deeply supervised through an attention module that is related to human attention as it is expressed by eye-tracking data. From the extensive evaluation, on seven different datasets, we have observed that the multi-task network performs as well as the state-of-the-art single-task methods (or in some cases better), while it requires less computational budget than having one independent network per each task.

show abstract

COGNIMUSE: a multimodal video database annotated with saliency, events, semantics and emotion with application to summarization

Cited by 43 publications

References 83 publications

Auditory sensory saliency as a better predictor of change than sound amplitude in pleasantness assessment of reproduced urban soundscapes

Auditory sensory saliency as a better predictor of change than sound amplitude in pleasantness assessment of reproduced urban soundscapes

Affective Computing for Large-scale Heterogeneous Multimedia Data

SUSiNet: See, Understand and Summarize It

Contact Info

Product

Resources

About