2013
DOI: 10.1109/tmm.2013.2267205
|View full text |Cite
|
Sign up to set email alerts
|

Multimodal Saliency and Fusion for Movie Summarization Based on Aural, Visual, and Textual Attention

Abstract: Abstract-Multimodal streams of sensory information are naturally parsed and integrated by humans using signal-level feature extraction and higher-level cognitive processes. Detection of attention-invoking audiovisual segments is formulated in this work on the basis of saliency models for the audio, visual and textual information conveyed in a video stream. Aural or auditory saliency is assessed by cues that quantify multifrequency waveform modulations, extracted through nonlinear operators and energy tracking.… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
152
0
16

Year Published

2014
2014
2022
2022

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 220 publications
(170 citation statements)
references
References 66 publications
0
152
0
16
Order By: Relevance
“…All subjects, having recently watched the 3 movies, were independently shown the skims in a consecutive manner and in random order. As in [20], the scale was graded the following way: poor (0% -40%), fair (40% -60%), good (60% -75%), very good (75% -90%) and excellent (90% -100%).…”
Section: Discussionmentioning
confidence: 99%
“…All subjects, having recently watched the 3 movies, were independently shown the skims in a consecutive manner and in random order. As in [20], the scale was graded the following way: poor (0% -40%), fair (40% -60%), good (60% -75%), very good (75% -90%) and excellent (90% -100%).…”
Section: Discussionmentioning
confidence: 99%
“…Energy Mean and MFCCs (1)(2)(3)(4)(5)(6)(7)(8)(9)(10)(11)(12) standard LPCs (0-13) deviation ZCR Spectral Flux Spectral Rolloff Chroma Vector (0-11) Clarity Table 1. Extracted audio features…”
Section: Low Level Descriptor Functionalsmentioning
confidence: 99%
“…Emotion recognition is an active research area with many applications such as human assistive systems [1], autonomous video summarisation [2], diagnosing patients mental illness, monitoring the drivers emotion variations to avoid accidents and helping the manmachine interactions [3]. Emotion recognition systems can also find applications in key event detection tasks [2], affective analysis in music [4] or dialogue management [5].…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…The process of key frame extraction is also known as "key-framing", "story-boarding" or "static video summarization". A video skim is a video of shorter length than the input stream, which is known as "dynamic video summarization" [1], [5], [6], [7], [8], [9].…”
Section: Introductionmentioning
confidence: 99%