Movie summarization based on audiovisual saliency detection

Evangelopoulos, Georgios; Rapantzikos, Konstantinos; Potamianos, Alexandros; Maragos, Petros; Zlatintsi, Athanasia; Avrithis, Yannis

doi:10.1109/icip.2008.4712308

Cited by 37 publications

(26 citation statements)

References 14 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Motion, face, camera and audio attention models were cues to capture salient information and identify the segments to compose a summary [2]. In our previous work, saliency was modeled independently in each modality, using meaningful temporal modulations in multiple frequencies for the audio and spatiotemporal features (color, motion, intensity) for the visual stream [4,5]. An integrated audiovisual saliency curve formed the basis of a bottom-up, content-independent, summarization technique.…”

Section: Introductionmentioning

confidence: 99%

“…The segment selection and skim rendering algorithm [5], based on the multimodal saliency curve follows the steps: 1. AVT is filtered with a median filter of length 2M + 1 frames.…”

Section: Video Summarizationmentioning

confidence: 99%

“…Perceptual attention is triggered by changes in the involved events like scene transitions, progressions or newly introduced themes. Computational models of attention have been previously developed using multimodal analysis, i.e., the concurrent analysis of multiple information modalities [1,2,3,4,5]. Automatic video content access, analysis and abstraction have thus emerged as potential applications.…”

Section: Introductionmentioning

confidence: 99%

“…In this work, we extend the audiovisual saliency-based video summarization algorithm in [5] to include text saliency automatically extracted from the subtitles information available with each movie distribution. For the computation of the frame-based text saliency metric the steps followed are: (i) extract the movie transcript from the subtitle file and perform shallow syntactic analysis including part-of-speech tagging, (ii) segment the audio stream using speech recognition technology to find the beginning and ending frame for each word in the transcript, and (iii) assign a text saliency value to each frame based on the parser tag assigned to the corresponding word.…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Video event detection and summarization using audio, visual and text saliency

Evangelopoulos

Zlatintsi

Σκούμας

et al. 2009

2009 IEEE International Conference on Acoustics, Speech and Signal Processing

Self Cite

View full text Add to dashboard Cite

Detection of perceptually important video events is formulated here on the basis of saliency models for the audio, visual and textual information conveyed in a video stream. Audio saliency is assessed by cues that quantify multifrequency waveform modulations, extracted through nonlinear operators and energy tracking. Visual saliency is measured through a spatiotemporal attention model driven by intensity, color and motion. Text saliency is extracted from part-of-speech tagging on the subtitles information available with most movie distributions. The various modality curves are integrated in a single attention curve, where the presence of an event may be signified in one or multiple domains. This multimodal saliency curve is the basis of a bottom-up video summarization algorithm, that refines results from unimodal or audiovisual-based skimming. The algorithm performs favorably for video summarization in terms of informativeness and enjoyability.

show abstract

Section: Introductionmentioning

confidence: 99%

“…The segment selection and skim rendering algorithm [5], based on the multimodal saliency curve follows the steps: 1. AVT is filtered with a median filter of length 2M + 1 frames.…”

Section: Video Summarizationmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Video event detection and summarization using audio, visual and text saliency

Evangelopoulos

Zlatintsi

Σκούμας

et al. 2009

2009 IEEE International Conference on Acoustics, Speech and Signal Processing

Self Cite

View full text Add to dashboard Cite

show abstract

“…They use motion, face, and camera attention along with audio attention models (audio saliency and speech/music) as cues to capture salient information and identify the audio and video segments to compose the summary. Rapantzikos et al (Evangelopoulos et al 2008(Evangelopoulos et al , 2009) build further on visual, audio, and textual attention models for visual summarization. The authors form a multimodal saliency curve integrating the aural, visual, and textual streams of videos based on efficient audio, image, and language processing and employ it as a metric for video event detection and abstraction.…”

Section: Novelty Detection and Video Summarizationmentioning

confidence: 99%

Vision, Attention Control, and Goals Creation System

Rapantzikos

Avrithis²,

Kolias³

2010

Perception-Action Cycle

View full text Add to dashboard Cite

Biological visual attention has been long studied by experts in the field of cognitive psychology. The Holy Grail of this study is the exact modeling of the interaction between the visual sensory and the process of perception. It seems that there is an informal agreement on the four important functions of the attention process: (a) the bottom-up process, which is responsible for the saliency of the input stimuli; (b) the top-down process that bias attention toward known areas or regions of predefined characteristics; (c) the attentional selection that fuses information derived from the two previous processes and enables focus; and (d) the dynamic evolution of the attentional selection process. In the following, we will outline established computational solutions for each of the four functions. OverviewMost of our impressions and memories are based on vision. Nevertheless vision mechanisms and functionalities are still not apparent. How do we perceive shape, color, or motion and how do we automatically focus on the most informative parts of the visual input? It has been long established that primates, including human, use focused attention and fast saccades to analyze visual stimuli based on the current situation or the desired goal. Neuroscientists have proven that neural information related to shape, motion, and color is transmitted through, at least, three parallel and interconnected channels to the brain rather than a single one. Hence a second question arises related to how these channels are "linked" in order to provide useful information to the brain.The Human Visual System (HVS) creates a perceptual representation of the world that is quite different than the two dimensional depiction of the retina.K. Rapantzikos ( ) Image, Video and

show abstract

Summarization of Videos by Image Quality Assessment

Cirne

Pedrini

2014

Advanced Information Systems Engineering

View full text Add to dashboard Cite

Video summarization plays a key role in manipulating large amounts of digital videos, making it faster to analyze their contents and aiding in the tasks of browsing, indexing and retrieval. A straightforward method for producing the summaries is by means of extraction of color features from the video frames. However, in order to automatically generate summaries as human beings would do, the way that humans perceive images must be considered, which can be done by image quality assessment (IQA) metrics. This work presents VSQUAL, a method for summarization of videos based on objective IQA metrics, which is also used for other purposes such as shot boundary detection and keyframe extraction. Results of the proposed method are compared against other approaches of the literature with a specific evaluation metric.

show abstract

Movie summarization based on audiovisual saliency detection

Cited by 37 publications

References 14 publications

Video event detection and summarization using audio, visual and text saliency

Video event detection and summarization using audio, visual and text saliency

Vision, Attention Control, and Goals Creation System

Summarization of Videos by Image Quality Assessment

Contact Info

Product

Resources

About