A hierarchical definition of optical variability is proposed that links physical magnitudes to visual saliency and yields a more reductionist interpretation than previous approaches. This definition is shown to be grounded on the classical efficient coding hypothesis. Moreover, we propose that a major goal of contextual adaptation mechanisms is to ensure the invariance of the behavior that the contribution of an image point to optical variability elicits in the visual system. This hypothesis and the necessary assumptions are tested through the comparison with human fixations and state-of-the-art approaches to saliency in three open access eye-tracking datasets, including one devoted to images with faces, as well as in a novel experiment using hyperspectral representations of surface reflectance. The results on faces yield a significant reduction of the potential strength of semantic influences compared to previous works. The results on hyperspectral images support the assumptions to estimate optical variability. As well, the proposed approach explains quantitative results related to a visual illusion observed for images of corners, which does not involve eye movements.
General dynamic scenes involve multiple rigid and flexible objects, with relative and common motion, camera induced or not. The complexity of the motion events together with their strong spatio-temporal correlations make the estimation of dynamic visual saliency a big computational challenge. In this work, we propose a computational model of saliency based on the assumption that perceptual relevant information is carried by high-order statistical structures. Through whitening, we completely remove the second-order information (correlations and variances) of the data, gaining access to the relevant information. The proposed approach is an analytically tractable and computationally simple framework which we call Dynamic Adaptive Whitening Saliency (AWS-D). For model assessment, the provided saliency maps were used to predict the fixations of human observers over six public video datasets, and also to reproduce the human behavior under certain psychophysical experiments (dynamic pop-out). The results demonstrate that AWS-D beats state-of-the-art dynamic saliency models, and suggest that the model might contain the basis to understand the key mechanisms of visual saliency. Experimental evaluation was performed using an extension to video of the well-known methodology for static images, together with a bootstrap permutation test (random label hypothesis) which yields additional information about temporal evolution of the metrics statistical significance.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.