Probabilistic Multi-Task Learning for Visual Saliency Estimation in Video

Li, Jia; Tian, Yonghong; Huang, Tiejun; Gao, Wen

doi:10.1007/s11263-010-0354-6

Cited by 132 publications

(75 citation statements)

References 27 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…5) from left to right and top to bottom. They can be categorized into: orientation pop-out (3,9,21,25,38,43,51,54), texture pop-out (6,12,14,24,36,39,47), curvature popout (35,48), size pop-out (8,10,17,30,52), grouping (2,13,26,28,34), color pop-out (1,4,16,19,20,27,29,31,32,33,41,44,50,53), intensity pop-out (11,18,37,42), search asymmetry (5;15, 22;46, 40;49), and other complex search arrays (7,23). In some patterns, targets are embedded in noise (e.g., speckle noise: 11, 20, 31 and orientation noise: 19, 41).…”

Section: B Stimulimentioning

confidence: 99%

Quantitative Analysis of Human-Model Agreement in Visual Saliency Modeling: A Comparative Study

Borji

Sihite

Itti

2013

IEEE Trans. on Image Process.

541

457

View full text Add to dashboard Cite

Abstract-Visual attention is a process that enables biological and machine vision systems to select the most relevant regions from a scene. Relevance is determined by two components: 1) top-down factors driven by task and 2) bottom-up factors that highlight image regions that are different from their surroundings. The latter are often referred to as "visual saliency". Modeling bottom-up visual saliency has been the subject of numerous research efforts during the past 20 years, with many successful applications in computer vision and robotics. Available models have been tested with different datasets (e.g., synthetic psychological search arrays, natural images or videos) using different evaluation scores (e.g., search slopes, comparison to human eye tracking) and parameter settings. This has made direct comparison of models difficult. Here we perform an exhaustive comparison of 35 state-of-the-art saliency models over 54 challenging synthetic patterns, 3 natural image datasets, and 2 video datasets, using 3 evaluation scores. We find that although model rankings vary, some models consistently perform better. Analysis of datasets reveals that existing datasets are highly center-biased, which influences some of the evaluation scores. Computational complexity analysis shows that some models are very fast, yet yield competitive eye movement prediction accuracy. Different models often have common easy/difficult stimuli. Furthermore, several concerns in visual saliency modeling, eye movement datasets, and evaluation scores are discussed and insights for future work are provided. Our study allows one to assess the state-of-the-art, helps organizing this rapidly growing field, and sets a unified comparison framework for gauging future efforts, similar to the PASCAL VOC challenge in the object recognition and detection domains.

show abstract

Section: B Stimulimentioning

confidence: 99%

Quantitative Analysis of Human-Model Agreement in Visual Saliency Modeling: A Comparative Study

Borji

Sihite

Itti

2013

IEEE Trans. on Image Process.

541

457

View full text Add to dashboard Cite

show abstract

“…The knowledge-based models have a potential to apply various machine learning techniques. For instance, Li et al [123] introduced multi-task learning to simulate the conjunction search (cf. Sect.…”

Section: Discussionmentioning

confidence: 99%

Computational Models of Human Visual Attention and Their Implementations: A Survey

Kimura

Yonetani

Hirayama

2013

IEICE Trans. Inf. & Syst.

View full text Add to dashboard Cite

SUMMARYWe humans are easily able to instantaneously detect the regions in a visual scene that are most likely to contain something of interest. Exploiting this pre-selection mechanism called visual attention for image and video processing systems would make them more sophisticated and therefore more useful. This paper briefly describes various computational models of human visual attention and their development, as well as related psychophysical findings. In particular, our objective is to carefully distinguish several types of studies related to human visual attention and saliency as a measure of attentiveness, and to provide a taxonomy from several viewpoints such as the main objective, the use of additional cues and mathematical principles. This survey finally discusses possible future directions for research into human visual attention and saliency computation. key words : human visual attention, computational model, saliency, bottom-up, top-down MotivationDeveloping sophisticated algorithms for detecting and recognizing something like objects from a given image and video has been a long distance challenge in pattern recognition and computer vision research fields. In fact, a huge number of studies, techniques and theories related to object detection and recognition have already been developed. In particular, several methods for detecting certain specific categories of objects such as human bodies and human faces have already been put to practical use in for example surveillance, authentication and the human-centric enhancement of image quality, with the best possible use of the prior knowledge of target objects (human bodies and faces) [1], [2]. However, generic object detection and recognition without any constraints as regards the target objects has remained major challenge, because (1) various kinds of objects might constitute the targets and (2) target objects in the same category might have different appearances due to variations of instances in a specific category, illumination changes and so on. † † The author is with the Graduate School of Informatics, Kyoto University, Kyoto-shi, 606-8501 Japan.† † † The author is with the Graduate School of Information Science, Nagoya University, Nagoya-shi, 464-8603 Japan.a) E-mail: akisato@ieee.org b) E-mail: yonetani@vision.kuee.kyoto-u.ac.jp c) E-mail: hirayama@is.nagoya-u.ac.jp DOI: 10.1587/transinf.E96.D.562 On the other hand, human beings seem to be able to detect various kinds of objects without any thought or effort. For example, from Fig. 1 left, we can easily and instantly detect a red car, a blue traffic sign and a broad white line. Visual attention [3] is considered to play an important role in achieving this function. Visual attention is one of the built-in mechanisms of the human visual system that quickly selects regions in a visual scene, which are most likely to contain items of interest. Such a pre-selection mechanism focusing only on relevant data would be essential in enabling computers to undertake subsequent processing such as generic o...

show abstract

“…Vig et al [9] used 3D spatio-temporal volumes from video for spatiotemporal saliency modeling. Li et al [10] proposed a multi-tasking Bayesian approach for combining bottomup and top-down saliency components. Kimura et al [11] learned a Dynamic Bayesian Network (DBN) to predict the likelihood of locations where humans typically focus on a video scene.…”

Section: A Bottom-up (Bu) Modelsmentioning

confidence: 99%

Modeling the influence of action on spatial attention in visual interactive environments

Borji

Sihite

Itti

2012

2012 IEEE International Conference on Robotics and Automation

View full text Add to dashboard Cite

Abstract-A large number of studies have been reported on top-down influences of visual attention. However, less progress have been made in understanding and modeling its mechanisms in real-world tasks. In this paper, we propose an approach for learning spatial attention taking into account influences of physical actions on top-down attention. For this purpose, we focus on interactive visual environments (video games) which are modest real-world simulations, where a player has to attend to certain aspects of visual stimuli and perform actions to achieve a goal. The basic idea is to learn a mapping from current mental state of the game player, represented by past actions and observations, to its gaze fixation. A data-driven approach is followed where we train a model from the data of some players and test it over a new subject. In particular, two contributions this paper makes are: 1) employing multimodal information including mean eye position, gist of a scene, physical actions, bottom-up saliency, and tagged events for state representation and 2) analysis of different methods of combining bottom-up and top-down influences. Comparing with other top-down task-driven and bottom-up spatio-temporal models, our approach shows higher NSS scores in predicting eye positions. I. INTRODUCTIONThe concept of saliency has attracted a lot of attention over the past several years. Basically, it is a fast and lowcost pre-processing step to select important image regions or objects to pass to higher-level and computationally demanding processes.The main concern in modeling saliency is how, when, and based on what, to select salient image regions. It is often assumed that attention is attracted by salient stimuli or events in the visual array [1] [2]. While this is the case, it is also known that a larger portion of attentional behavior comes from ongoing task inferences which dynamically change and are dependent on the algorithm of the task. Computational modeling of task influences on attention is conceptually hard to frame. The biggest challenge comes from the fact that we don't know much about how humans perform complex tasks. This has been at the focus of artificial intelligence (AI) and cognitive science research for the past 50 years. However, we know to some extent about algorithms and attentional behaviors of some laboratory-scale stimuli and tasks. One solution when dealing with complex problems is learning from data, experiences or history which could be gathered from the behavior of other humans especially when the goal is to explain human data.There are already many bottom-up saliency models for static (still images) and spatio-temporal stimuli (videos).

show abstract

Probabilistic Multi-Task Learning for Visual Saliency Estimation in Video

Cited by 132 publications

References 27 publications

Quantitative Analysis of Human-Model Agreement in Visual Saliency Modeling: A Comparative Study

Quantitative Analysis of Human-Model Agreement in Visual Saliency Modeling: A Comparative Study

Computational Models of Human Visual Attention and Their Implementations: A Survey

Modeling the influence of action on spatial attention in visual interactive environments

Contact Info

Product

Resources

About