This paper presents an efficient metric for quantifying the visual fidelity of natural images based on near-threshold and suprathreshold properties of human vision. The proposed metric, the visual signal-to-noise ratio (VSNR), operates via a two-stage approach. In the first stage, contrast thresholds for detection of distortions in the presence of natural images are computed via wavelet-based models of visual masking and visual summation in order to determine whether the distortions in the distorted image are visible. If the distortions are below the threshold of detection, the distorted image is deemed to be of perfect visual fidelity (VSNR = infinity) and no further analysis is required. If the distortions are suprathreshold, a second stage is applied which operates based on the low-level visual property of perceived contrast, and the mid-level visual property of global precedence. These two properties are modeled as Euclidean distances in distortion-contrast space of a multiscale wavelet decomposition, and VSNR is computed based on a simple linear sum of these distances. The proposed VSNR metric is generally competitive with current metrics of visual fidelity; it is efficient both in terms of its low computational complexity and in terms of its low memory requirements; and it operates based on physical luminances and visual angle (rather than on digital pixel values and pixel-based dimensions) to accommodate different viewing conditions.
Image quality assessment (IQA) has been a topic of intense research over the last several decades. With each year comes an increasing number of new IQA algorithms, extensions of existing IQA algorithms, and applications of IQA to other disciplines. In this article, I first provide an up-to-date review of research in IQA, and then I highlight several open challenges in this field. The first half of this article provides discuss key properties of visual perception, image quality databases, existing full-reference, no-reference, and reduced-reference IQA algorithms. Yet, despite the remarkable progress that has been made in IQA, many fundamental challenges remain largely unsolved. The second half of this article highlights some of these challenges. I specifically discuss challenges related to lack of complete perceptual models for: natural images, compound and suprathreshold distortions, and multiple distortions, and the interactive effects of these distortions on the images. I also discuss challenges related to IQA of images containing nontraditional, and I discuss challenges related to the computational efficiency. The goal of this article is not only to help practitioners and researchers keep abreast of the recent advances in IQA, but to also raise awareness of the key limitations of current IQA knowledge.
This paper presents an algorithm designed to measure the local perceived sharpness in an image. Our method utilizes both spectral and spatial properties of the image: For each block, we measure the slope of the magnitude spectrum and the total spatial variation. These measures are then adjusted to account for visual perception, and then, the adjusted measures are combined via a weighted geometric mean. The resulting measure, i.e., S(3) (spectral and spatial sharpness), yields a perceived sharpness map in which greater values denote perceptually sharper regions. This map can be collapsed into a single index, which quantifies the overall perceived sharpness of the whole image. We demonstrate the utility of the S(3) measure for within-image and across-image sharpness prediction, no-reference image quality assessment of blurred images, and monotonic estimation of the standard deviation of the impulse response used in Gaussian blurring. We further evaluate the accuracy of S(3) in local sharpness estimation by comparing S(3) maps to sharpness maps generated by human subjects. We show that S(3) can generate sharpness maps, which are highly correlated with the human-subject maps.
The mainstream approach to image quality assessment has centered around accurately modeling the single most relevant strategy employed by the human visual system (HVS) when judging image quality (e.g., detecting visible differences, and extracting image structure/information). In this work, we suggest that a single strategy may not be sufficient; rather, we advocate that the HVS uses multiple strategies to determine image quality. For images containing near-threshold distortions, the image is most apparent, and thus the HVS attempts to look past the image and look for the distortions (a detection-based strategy). For images containing clearly visible distortions, the distortions are most apparent, and thus the HVS attempts to look past the distortion and look for the image's subject matter (an appearance-based strategy). Here, we present a quality assessment method [most apparent distortion (MAD)], which attempts to explicitly model these two separate strategies. Local luminance and contrast masking are used to estimate detectionbased perceived distortion in high-quality images, whereas changes in the local statistics of spatial-frequency components are used to estimate appearance-based perceived distortion in low-quality images. We show that a combination of these two measures can perform well in predicting subjective ratings of image quality.
In this letter, we present a simple, yet effective wavelet-based algorithm for estimating both global and local image sharpness (FISH, Fast Image Sharpness). FISH operates by first decomposing the input image via a three-level separable discrete wavelet transform (DWT). Next, the log-energies of the DWT subbands are computed. Finally, a scalar index corresponding to the image's overall sharpness is computed via a weighted average of these log-energies. Testing on several image databases demonstrates that, despite its simplicity, FISH is competitive with the currently best-performing techniques both for sharpness estimation and for no-reference image quality assessment.Index Terms-Blur, image quality, local image sharpness, sharpness, wavelet.
This paper presents an algorithm for video quality assessment, spatiotemporal MAD (ST-MAD), which extends our previous image-based algorithm (MAD [1]) to take into account visual perception of motion artifacts. ST-MAD employs spatiotemporal "images" (STS images [2]) created by taking time-based slices of the original and distorted videos. Motion artifacts manifest in the STS images as spatial artifacts, which allows one to quantify motion-based distortion by using classical image-quality assessment techniques. ST-MAD estimates motion-based distortion by applying MAD's appearance-based model to compare the distorted video's STS images to the original video's STS images. This comparison is further adjusted by using optical-flow-derived weights designed to give greater precedence to fast-moving regions located toward the center of the video. Testing on the LIVE video database demonstrates that ST-MAD performs well in predicting video quality.
Natural scenes, like most all natural data sets, show considerable redundancy. Although many forms of redundancy have been investigated (e.g., pixel distributions, power spectra, contour relationships, etc.), estimates of the true entropy of natural scenes have been largely considered intractable. We describe a technique for estimating the entropy and relative dimensionality of image patches based on a function we call the proximity distribution (a nearest-neighbor technique). The advantage of this function over simple statistics such as the power spectrum is that the proximity distribution is dependent on all forms of redundancy. We demonstrate that this function can be used to estimate the entropy (redundancy) of 3 ϫ 3 patches of known entropy as well as 8 ϫ 8 patches of Gaussian white noise, natural scenes, and noise with the same power spectrum as natural scenes. The techniques are based on assumptions regarding the intrinsic dimensionality of the data, and although the estimates depend on an extrapolation model for images larger than 3 ϫ 3, we argue that this approach provides the best current estimates of the entropy and compressibility of natural-scene patches and that it provides insights into the efficiency of any coding strategy that aims to reduce redundancy. We show that the sample of 8 ϫ 8 patches of natural scenes used in this study has less than half the entropy of 8 ϫ 8 white noise and less than 60% of the entropy of noise with the same power spectrum. In addition, given a finite number of samples ͑Ͻ2 20 ͒ drawn randomly from the space of 8 ϫ 8 patches, the subspace of 8 ϫ 8 natural-scene patches shows a dimensionality that depends on the sampling density and that for low densities is significantly lower dimensional than the space of 8 ϫ 8 patches of white noise and noise with the same power spectrum.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
334 Leonard St
Brooklyn, NY 11211
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.