Bottom-up spatiotemporal visual attention model for video analysis

Rapantzikos, Konstantinos; Tsapatsoulis, Nicolas; Avrithis, Yannis; Kollias, Stefanos

doi:10.1049/iet-ipr:20060040

Cited by 43 publications

(35 citation statements)

References 37 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Specifically, we extend the spatial center-surround operator of Itti et al in a straightforward manner by using volumetric neighborhoods in a spatiotemporal Gaussian scale-space (Rapantzikos et al 2007). In such a framework, a video sequence is treated as a volume, which is created by stacking temporally consequent video frames.…”

Section: Volumetric Saliency By Feature Competitionmentioning

confidence: 99%

“…This model has proven its efficiency in enhancing performance of a video classification system (Rapantzikos and Avrithis 2005). The two other saliency-based methods are the state-of-the art static saliency-based approach of Itti et al (1998) and an extension using a motion map (Rapantzikos and Tsapatsoulis 2005).…”

Section: Evaluation Of Classification Performancementioning

confidence: 99%

See 1 more Smart Citation

Vision, Attention Control, and Goals Creation System

Rapantzikos

Avrithis²,

Kolias³

2010

Perception-Action Cycle

View full text Add to dashboard Cite

Biological visual attention has been long studied by experts in the field of cognitive psychology. The Holy Grail of this study is the exact modeling of the interaction between the visual sensory and the process of perception. It seems that there is an informal agreement on the four important functions of the attention process: (a) the bottom-up process, which is responsible for the saliency of the input stimuli; (b) the top-down process that bias attention toward known areas or regions of predefined characteristics; (c) the attentional selection that fuses information derived from the two previous processes and enables focus; and (d) the dynamic evolution of the attentional selection process. In the following, we will outline established computational solutions for each of the four functions. OverviewMost of our impressions and memories are based on vision. Nevertheless vision mechanisms and functionalities are still not apparent. How do we perceive shape, color, or motion and how do we automatically focus on the most informative parts of the visual input? It has been long established that primates, including human, use focused attention and fast saccades to analyze visual stimuli based on the current situation or the desired goal. Neuroscientists have proven that neural information related to shape, motion, and color is transmitted through, at least, three parallel and interconnected channels to the brain rather than a single one. Hence a second question arises related to how these channels are "linked" in order to provide useful information to the brain.The Human Visual System (HVS) creates a perceptual representation of the world that is quite different than the two dimensional depiction of the retina.K. Rapantzikos ( ) Image, Video and

show abstract

Section: Volumetric Saliency By Feature Competitionmentioning

confidence: 99%

Section: Evaluation Of Classification Performancementioning

confidence: 99%

Vision, Attention Control, and Goals Creation System

Rapantzikos

Avrithis²,

Kolias³

2010

Perception-Action Cycle

View full text Add to dashboard Cite

show abstract

“…The different orientations are then fused to produce a single orientation volume. More details can be found in [3]. Volumes for each feature, are decomposed into multiple scales.…”

Section: Visual Analysismentioning

confidence: 99%

“…Perceptual attention is triggered by changes in the involved events like scene transitions, progressions or newly introduced themes. Computational models of attention have been previously developed using multimodal analysis, i.e., the concurrent analysis of multiple information modalities [1,2,3,4,5]. Automatic video content access, analysis and abstraction have thus emerged as potential applications.…”

Section: Introductionmentioning

confidence: 99%

Video event detection and summarization using audio, visual and text saliency

Evangelopoulos

Zlatintsi

Σκούμας

et al. 2009

2009 IEEE International Conference on Acoustics, Speech and Signal Processing

Self Cite

View full text Add to dashboard Cite

Detection of perceptually important video events is formulated here on the basis of saliency models for the audio, visual and textual information conveyed in a video stream. Audio saliency is assessed by cues that quantify multifrequency waveform modulations, extracted through nonlinear operators and energy tracking. Visual saliency is measured through a spatiotemporal attention model driven by intensity, color and motion. Text saliency is extracted from part-of-speech tagging on the subtitles information available with most movie distributions. The various modality curves are integrated in a single attention curve, where the presence of an event may be signified in one or multiple domains. This multimodal saliency curve is the basis of a bottom-up video summarization algorithm, that refines results from unimodal or audiovisual-based skimming. The algorithm performs favorably for video summarization in terms of informativeness and enjoyability.

show abstract

“…Visual input is processed by computer vision [15] (see Section 4.1) or synthetic vision techniques [2], as appropriate, and stored in a short-term sensory storage. This acts as a temporary buffer and contains a large amount of raw data for short periods of time.…”

Section: General Frameworkmentioning

confidence: 99%

Multimodal Sensing, Interpretation and Copying of Movements by a Virtual Agent

Bevacqua

Raouzaiou

Peters

et al. 2006

Perception and Interactive Technologies

View full text Add to dashboard Cite

Abstract. We present a scenario whereby an agent senses, interprets and copies a range of facial and gesture expression from a person in the real-world. Input is obtained via a video camera and processed initially using computer vision techniques. It is then processed further in a framework for agent perception, planning and behaviour generation in order to perceive, interpret and copy a number of gestures and facial expressions corresponding to those made by the human. By perceive, we mean that the copied behaviour may not be an exact duplicate of the behaviour made by the human and sensed by the agent, but may rather be based on some level of interpretation of the behaviour. Thus, the copied behaviour may be altered and need not share all of the characteristics of the original made by the human.

show abstract

Bottom-up spatiotemporal visual attention model for video analysis

Cited by 43 publications

References 37 publications

Vision, Attention Control, and Goals Creation System

Vision, Attention Control, and Goals Creation System

Video event detection and summarization using audio, visual and text saliency

Multimodal Sensing, Interpretation and Copying of Movements by a Virtual Agent

Contact Info

Product

Resources

About