Abstract-Objective methods for assessing perceptual image quality have traditionally attempted to quantify the visibility of errors between a distorted image and a reference image using a variety of known properties of the human visual system. Under the assumption that human visual perception is highly adapted for extracting structural information from a scene, we introduce an alternative framework for quality assessment based on the degradation of structural information. As a specific example of this concept, we develop a Structural Similarity Index and demonstrate its promise through a set of intuitive examples, as well as comparison to both subjective ratings and state-of-the-art objective methods on a database of images compressed with JPEG and JPEG2000. 1
The structural similarity image quality paradigm is based on the assumption that the human visual system is highly adapted for extracting structural information from the scene, and therefore a measure of structural similarity can provide a good approximation to perceived image quality. This paper proposes a multi-scale structural similarity method, which supplies more flexibility than previous single-scale methods in incorporating the variations of viewing conditions. We develop an image synthesis method to calibrate the parameters that define the relative importance of different scales. Experimental comparisons demonstrate the effectiveness of the proposed method.
Statistical dependencies in the responses of sensory neurons govern both the amount of stimulus information conveyed and the means by which downstream neurons can extract it. Although a variety of measurements indicate the existence of such dependencies 1-3 , their origin and importance for neural coding are poorly understood. Here we analyse the functional significance of correlated firing in a complete population of macaque parasol retinal ganglion cells using a model of multi-neuron spike responses 4,5 . The model, with parameters fit directly to physiological data, simultaneously captures both the stimulus dependence and detailed spatio-temporal correlations in population responses, and provides two insights into the structure of the neural code. First, neural encoding at the population level is less noisy than one would expect from the variability of individual neurons: spike times are more precise, and can be predicted more accurately when the spiking of neighbouring neurons is taken into account. Second, correlations provide additional sensory information: optimal, model-based decoding that exploits the response correlation structure extracts 20% more information about the visual scene than decoding under the assumption of independence, and preserves 40% more visual information than optimal linear decoding 6 . This model-based approach reveals the role of correlated activity in the retinal coding of visual stimuli, and provides a general framework for understanding the importance of correlated activity in populations of neurons.How does the spiking activity of a neural population represent the sensory environment? The answer depends critically on the structure of neuronal correlations, or the tendency of groups of neurons to fire temporally coordinated spike patterns. The statistics of such patterns have been studied in a variety of brain areas, and their significance in the processing and representation of sensory information has been debated extensively 2,3,7-13 .Author Information Reprints and permissions information is available at www.nature.com/reprints. Correspondence and requests for materials should be addressed to J.W.P. (E-mail: pillow@gatsby.ucl.ac.uk). Full Methods and any associated references are available in the online version of the paper at www.nature.com/nature.Supplementary Information is linked to the online version of the paper at www.nature.com/nature. Previous studies have examined visual coding by pairs of neurons 11 and the statistics of simultaneous firing patterns in larger neural populations 14,15 . However, no previous approach has addressed how correlated spiking activity in complete neural populations depends on the pattern of visual stimulation, or has answered the question of how such dependencies affect the encoding of visual stimuli. NIH Public AccessHere we introduce a model-based methodology for studying this problem. We describe the encoding of stimuli in the spike trains of a neural population with a generalized linear model (Fig. 1a), a generalization of th...
Abstract-We describe a method for removing noise from digital images, based on a statistical model of the coefficients of an overcomplete multiscale oriented basis. Neighborhoods of coefficients at adjacent positions and scales are modeled as the product of two independent random variables: a Gaussian vector and a hidden positive scalar multiplier. The latter modulates the local variance of the coefficients in the neighborhood, and is thus able to account for the empirically observed correlation between the coefficient amplitudes. Under this model, the Bayesian least squares estimate of each coefficient reduces to a weighted average of the local linear estimates over all possible values of the hidden multiplier variable. We demonstrate through simulations with images contaminated by additive white Gaussian noise that the performance of this method substantially surpasses that of previously published methods, both visually and in terms of mean squared error.Index Terms-Bayesian estimation, Gaussian scale mixtures, hidden Markov model, natural images, noise removal, overcomplete representations, statistical models, steerable pyramid.T HE artifacts arising from many imaging devices are quite different from the images that they contaminate, and this difference allows humans to "see past" the artifacts to the underlying image. The goal of image restoration is to relieve human observers from this task (and perhaps even to improve upon their abilities) by reconstructing a plausible estimate of the original image from the distorted or noisy observation. A prior probability model for both the noise and for uncorrupted images is of central importance for this application.Modeling the statistics of natural images is a challenging task, partly because of the high dimensionality of the signal. Two Manuscript received September 29, 2002; revised April 28, 2003. During the development of this work, V. Strela was on leave from Drexel University, and was supported by an AMS Centennial Fellowship. M. J. Wainwright was supported by a NSERC-1967 Fellowship. J. Portilla and E. P. Simoncelli were supported by an NSF CAREER grant and Alfred P. Sloan Fellowship to E. P. Simoncelli, and by the Howard Hughes Medical Institute. J. Portilla was also supported by an FPI fellowship, and subsequently by a "Ramón y Cajal" grant (both from the Spanish government). The associate editor coordinating the review of this manuscript and approving it for publication was Dr. Mario A. T. Figueiredo basic assumptions are commonly made in order to reduce dimensionality. The first is that the probability structure may be defined locally. Typically, one makes a Markov assumption, that the probability density of a pixel, when conditioned on a set of neighbors, is independent of the pixels beyond the neighborhood. The second is an assumption of spatial homogeneity: the distribution of values in a neighborhood is the same for all such neighborhoods, regardless of absolute spatial position. The Markov random field model that results from these two assumptions i...
It has long been assumed that sensory neurons are adapted, through both evolutionary and developmental processes, to the statistical properties of the signals to which they are exposed. Attneave (1954)Barlow (1961) proposed that information theory could provide a link between environmental statistics and neural responses through the concept of coding efficiency. Recent developments in statistical modeling, along with powerful computational tools, have enabled researchers to study more sophisticated statistical models for visual images, to validate these models empirically against large sets of data, and to begin experimentally testing the efficient coding hypothesis for both individual neurons and populations of neurons.
The human capacity to recognize complex visual patterns emerges in a sequence of brain areas known as the ventral stream, beginning with primary visual cortex (V1). We develop a population model for mid-ventral processing, in which non-linear combinations of V1 responses are averaged within receptive fields that grow with eccentricity. To test the model, we generate novel forms of visual metamers — stimuli that differ physically, but look the same. We develop a behavioral protocol that uses metameric stimuli to estimate the receptive field sizes in which the model features are represented. Because receptive field sizes change along the ventral stream, the behavioral results can identify the visual area corresponding to the representation. Measurements in human observers implicate V2, providing a new functional account of this area. The model explains deficits of peripheral vision known as “crowding”, and provides a quantitative framework for assessing the capabilities of everyday vision.
Responses of sensory neurons differ across repeated measurements. This variability is usually treated as stochasticity arising within neurons or neural circuits. However, some portion of the variability arises from fluctuations in excitability due to factors that are not purely sensory, such as arousal, attention, and adaptation. To isolate these fluctuations, we developed a model in which spikes are generated by a Poisson process whose rate is the product of a drive that is sensory in origin, and a gain summarizing stimulus-independent modulatory influences on excitability. This model provides an accurate account of response distributions of visual neurons in macaque LGN, V1, V2, and MT, revealing that variability originates in large part from excitability fluctuations which are correlated over time and between neurons, and which increase in strength along the visual pathway. The model provides a parsimonious explanation for observed systematic dependencies of response variability and covariability on firing rate.
Human visual speed perception is qualitatively consistent with an optimal Bayesian observer that combines noisy measurements with a prior preference for lower speeds. Quantitative validation of this model, however, is difficult because the precise noise characteristics and prior expectations are unknown. Here, we present an augmented observer model that accounts for the variability of subjective responses in a speed discrimination task. This allows us to infer the shape of the prior probability as well as the internal noise characteristics directly from psychophysical data. For all subjects, we find that the fitted model provides an accurate account of the data across a wide range of stimulus parameters. The inferred prior distribution exhibits significantly heavier tails than a Gaussian, and the amplitude of the internal noise is approximately proportional to stimulus speed, and depends inversely on stimulus contrast. The framework is general, and should prove applicable to other experiments and perceptual modalities.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.