2017
DOI: 10.1186/s13640-017-0194-1
|View full text |Cite
|
Sign up to set email alerts
|

COGNIMUSE: a multimodal video database annotated with saliency, events, semantics and emotion with application to summarization

Abstract: Research related to computational modeling for machine-based understanding requires ground truth data for training, content analysis, and evaluation. In this paper, we present a multimodal video database, namely COGNIMUSE, annotated with sensory and semantic saliency, events, cross-media semantics, and emotion. The purpose of this database is manifold; it can be used for training and evaluation of event detection and summarization algorithms, for classification and recognition of audio-visual and cross-media e… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
31
0
9

Year Published

2018
2018
2023
2023

Publication Types

Select...
4
3
2

Relationship

2
7

Authors

Journals

citations
Cited by 43 publications
(41 citation statements)
references
References 83 publications
0
31
0
9
Order By: Relevance
“…Fsτ i (13) In the model, excitation and inhibition are determined according to the expressions shown in Equation 14. Firstly, excitation e is calculated on the input a in .…”
Section: Sensory Activation Stagementioning
confidence: 99%
See 1 more Smart Citation
“…Fsτ i (13) In the model, excitation and inhibition are determined according to the expressions shown in Equation 14. Firstly, excitation e is calculated on the input a in .…”
Section: Sensory Activation Stagementioning
confidence: 99%
“…Sensory saliency is determined by the enhanced sensitivity or tuning of the human hearing system to specific sound features [12]. On the other hand, semantic saliency requires recognition of the sound and incongruency within the environment [13]. Sensory saliency has been investigated by explicitly identifying features that alter behavior [12] or by inspection of the spectrogram using methods similar to the ones used to model visual saliency [14].…”
Section: Introductionmentioning
confidence: 99%
“…COGNINMUSE is a collection of videos annotated with sensory and semantic saliency, events, cross-media semantics, and emotions [178]. A subset of 3.5h extracted from movies, including textual modality, are annotated on arousal and valence.…”
Section: Datasets For Ac Of Multimodal Datamentioning
confidence: 99%
“…The most relevant dataset to our tasks is the COGNIMUSE database [70,1], which constitutes a video database annotated with ground-truth annotations for frame-wise sensory and semantic importance as well as audio and visual events. It is a generic database that has been used for video summarization [36], as well as audio-visual concept recognition [7].…”
Section: Datasetsmentioning
confidence: 99%