Abstract-We introduce a general framework for end-to-end optimization of the rate-distortion performance of nonlinear transform codes assuming scalar quantization. The framework can be used to optimize any differentiable pair of analysis and synthesis transforms in combination with any differentiable perceptual metric. As an example, we consider a code built from a linear transform followed by a form of multi-dimensional local gain control. Distortion is measured with a state-of-theart perceptual metric. When optimized over a large database of images, this representation offers substantial improvements in bitrate and perceptual appearance over fixed (DCT) codes, and over linear transform codes optimized for mean squared error.
A comprehensive investigationGUSTAU CAMPS-VALLS, JOCHEM VERRELST, JORDI MUÑOZ-MARÍ, VALERO LAPARRA, FERNANDO MATEO-JIMÉNEZ, AND JOSÉ GÓMEZ-DAN Advances in Machine Learning for Remote Sensing and Geosciences image licensed by ingram publishing 0274-6638/16©2016IEEE ieee Geoscience and remote sensinG maGazine JUNE 2016illustrative examples. In particular, important problems for land, ocean, and atmosphere monitoring are considered, from accurately estimating oceanic chlorophyll content and pigments to retrieving vegetation properties from multi-and hyperspectral sensors as well as estimating atmospheric parameters (e.g., temperature, moisture, and ozone) from infrared sounders.Unprecedented data Stream for Land, ocean, and atmoSphere monitoring The spatiotemporally explicit, quantitative retrieval methods for Earth's surface and atmosphere characteristics are required in a variety of Earth system applications. Optical Earth-observing satellites that are endowed with high temporal resolution enable the retrieval and, hence, the monitoring of climate and biogeophysical variables [1], [2]. With the forthcoming superspectral Copernicus Sentinel-2 (S2) [3] and Sentinel-3 missions [4], as well as the planned EnMAP [5], HyspIRI [6], PRISMA [7], and the European Space Agency's candidate FLEX [8], an unprecedented data stream for land, ocean, and atmosphere monitoring will soon become available to a diverse user community. This vast data stream requires enhanced processing techniques that are accurate, robust, and fast. Additionally, the statistical models should capture plausible physical relationships and explain the problem at hand. A wide variety of biogeophysical retrieval methods have been developed over the last few decades, but only a few of them have made it into operational processing chains, and many are still in their infancy [9]. Essentially, there are two main approaches to the inverse problem of estimating biophysical parameters from spectra: 1) parametric physically based models and 2) nonparametric statistical models. On one hand, parametric, physically based models are commonly used to model biological processes and climate variables in Earth monitoring. These models rely on established physical relationships and implement complex combinations of scientific hypotheses. Unfortunately, they do not exploit empirical data to constrain simulation outcomes; thus, despite their solid physical foundation, they are becoming more obscure because more complex processes, parameterizations, and priors need to be included. These issues give rise to too-rigid solutions and large-model discrepancies (see [10] and the references therein). Alternatively, nonparametric statistical models are typically only concerned with developing data-driven models, paying little attention to the physical rules governing the system. The field has proven to be successful in many disciplines of science and engineering [11], and, in general, nonlinear and nonparametric model instantiations typically lead to a more flexible and improved perfor...
Mechanisms of human color vision are characterized by two phenomenological aspects: the system is nonlinear and adaptive to changing environments. Conventional attempts to derive these features from statistics use separate arguments for each aspect. The few statistical approaches that do consider both phenomena simultaneously follow parametric formulations based on empirical models. Therefore, it may be argued that the behavior does not come directly from the color statistics but from the convenient functional form adopted. In addition, many times the whole statistical analysis is based on simplified databases that disregard relevant physical effects in the input signal, as for instance by assuming flat Lambertian surfaces.In this work, we address the simultaneous statistical explanation of (i) the nonlinear behavior of achromatic and chromatic mechanisms in a fixed adaptation state, and (ii) the change of such behavior, i.e. adaptation, under the change of observation conditions. Both phenomena emerge directly from the samples through a single data-driven method: the Sequential Principal Curves Analysis (SPCA) with local metric. SPCA is a new manifold learning technique to derive a set of sensors adapted to the manifold using different optimality criteria. Moreover, in order to reproduce the empirical adaptation reported under D65 and A illuminations, a new database of colorimetrically calibrated images of natural objects under these illuminants was gathered, thus overcoming the limitations of available databases.The results obtained by applying SPCA show that the psychophysical behavior on color discrimination thresholds, discount of the illuminant and corresponding pairs in asymmetric color matching, emerge directly from realistic data regularities assuming no a priori functional form. These results provide stronger evidence for the hypothesis of a statistically driven organization of color sensors. Moreover, the obtained results suggest that color perception at this low abstraction level may be guided by an error minimization strategy rather than by the information maximization principle.
Structural similarity metrics and information-theory-based metrics have been proposed as completely different alternatives to the traditional metrics based on error visibility and human vision models. Three basic criticisms were raised against the traditional error visibility approach: (1) it is based on near-threshold performance, (2) its geometric meaning may be limited, and (3) stationary pooling strategies may not be statistically justified. These criticisms and the good performance of structural and information-theory-based metrics have popularized the idea of their superiority over the error visibility approach. In this work we experimentally or analytically show that the above criticisms do not apply to error visibility metrics that use a general enough divisive normalization masking model. Therefore, the traditional divisive normalization metric 1 is not intrinsically inferior to the newer approaches. In fact, experiments on a number of databases including a wide range of distortions show that divisive normalization is fairly competitive with the newer approaches, robust, and easy to interpret in linear terms. These results suggest that, despite the criticisms of the traditional error visibility approach, divisive normalization masking models should be considered in the image quality discussion.
Abstract-Most signal processing problems involve the challenging task of multidimensional probability density function (PDF) estimation. In this work, we propose a solution to this problem by using a family of Rotation-based Iterative Gaussianization (RBIG) transforms. The general framework consists of the sequential application of a univariate marginal Gaussianization transform followed by an orthonormal transform. The proposed procedure looks for differentiable transforms to a known PDF so that the unknown PDF can be estimated at any point of the original domain. In particular, we aim at a zero mean unit covariance Gaussian for convenience.RBIG is formally similar to classical iterative Projection Pursuit (PP) algorithms. However, we show that, unlike in PP methods, the particular class of rotations used has no special qualitative relevance in this context, since looking for interestingness is not a critical issue for PDF estimation. The key difference is that our approach focuses on the univariate part (marginal Gaussianization) of the problem rather than on the multivariate part (rotation). This difference implies that one may select the most convenient rotation suited to each practical application.The differentiability, invertibility and convergence of RBIG are theoretically and experimentally analyzed. Relation to other methods, such as Radial Gaussianization (RG), one-class support vector domain description (SVDD), and deep neural networks (DNN) is also pointed out. The practical performance of RBIG is successfully illustrated in a number of multidimensional problems such as image synthesis, classification, denoising, and multiinformation estimation.
We present an image quality metric based on the transformations associated with the early visual system: local luminance subtraction and local gain control. Images are decomposed using a Laplacian pyramid, which subtracts a local estimate of the mean luminance at multiple scales. Each pyramid coefficient is then divided by a local estimate of amplitude (weighted sum of absolute values of neighbors), where the weights are optimized for prediction of amplitude using (undistorted) images from a separate database. We define the quality of a distorted image, relative to its undistorted original, as the root mean squared error in this "normalized Laplacian" domain. We show that both luminance subtraction and amplitude division stages lead to significant reductions in redundancy relative to the original image pixels. We also show that the resulting quality metric provides a better account of human perceptual judgements than either MS-SSIM or a recently-published gain-control metric based on oriented filters.
The conventional approach in computational neuroscience in favor of the efficient coding hypothesis goes from image statistics to perception. It has been argued that the behavior of the early stages of biological visual processing (e.g., spatial frequency analyzers and their nonlinearities) may be obtained from image samples and the efficient coding hypothesis using no psychophysical or physiological information. In this work we address the same issue in the opposite direction: from perception to image statistics. We show that psychophysically fitted image representation in V1 has appealing statistical properties, for example, approximate PDF factorization and substantial mutual information reduction, even though no statistical information is used to fit the V1 model. These results are complementary evidence in favor of the efficient coding hypothesis.
When adapted to a particular scenery our senses may fool us: colors are misinterpreted, certain spatial patterns seem to fade out, and static objects appear to move in reverse. A mere empirical description of the mechanisms tuned to color, texture, and motion may tell us where these visual illusions come from. However, such empirical models of gain control do not explain why these mechanisms work in this apparently dysfunctional manner. Current normative explanations of aftereffects based on scene statistics derive gain changes by (1) invoking decorrelation and linear manifold matching/equalization, or (2) using nonlinear divisive normalization obtained from parametric scene models. These principled approaches have different drawbacks: the first is not compatible with the known saturation nonlinearities in the sensors and it cannot fully accomplish information maximization due to its linear nature. In the second, gain change is almost determined a priori by the assumed parametric image model linked to divisive normalization. In this study we show that both the response changes that lead to aftereffects and the nonlinear behavior can be simultaneously derived from a single statistical framework: the Sequential Principal Curves Analysis (SPCA). As opposed to mechanistic models, SPCA is not intended to describe how physiological sensors work, but it is focused on explaining why they behave as they do. Nonparametric SPCA has two key advantages as a normative model of adaptation: (i) it is better than linear techniques as it is a flexible equalization that can be tuned for more sensible criteria other than plain decorrelation (either full information maximization or error minimization); and (ii) it makes no a priori functional assumption regarding the nonlinearity, so the saturations emerge directly from the scene data and the goal (and not from the assumed function). It turns out that the optimal responses derived from these more sensible criteria and SPCA are consistent with dysfunctional behaviors such as aftereffects.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.