We present the results of a recent large-scale subjective study of video quality on a collection of videos distorted by a variety of application-relevant processes. Methods to assess the visual quality of digital videos as perceived by human observers are becoming increasingly important, due to the large number of applications that target humans as the end users of video. Owing to the many approaches to video quality assessment (VQA) that are being developed, there is a need for a diverse independent public database of distorted videos and subjective scores that is freely available. The resulting Laboratory for Image and Video Engineering (LIVE) Video Quality Database contains 150 distorted videos (obtained from ten uncompressed reference videos of natural scenes) that were created using four different commonly encountered distortion types. Each video was assessed by 38 human subjects, and the difference mean opinion scores (DMOS) were recorded. We also evaluated the performance of several state-of-the-art, publicly available full-reference VQA algorithms on the new database. A statistical evaluation of the relative performance of these algorithms is also presented. The database has a dedicated web presence that will be maintained as long as it remains relevant and the data is available online.
Measurement of image or video quality is crucial for many image-processing algorithms, such as acquisition, compression, restoration, enhancement, and reproduction. Traditionally, image quality assessment (QA) algorithms interpret image quality as similarity with a "reference" or "perfect" image. The obvious limitation of this approach is that the reference image or video may not be available to the QA algorithm. The field of blind, or no-reference, QA, in which image quality is predicted without the reference image or video, has been largely unexplored, with algorithms focusing mostly on measuring the blocking artifacts. Emerging image and video compression technologies can avoid the dreaded blocking artifact by using various mechanisms, but they introduce other types of distortions, specifically blurring and ringing. In this paper, we propose to use natural scene statistics (NSS) to blindly measure the quality of images compressed by JPEG2000 (or any other wavelet based) image coder. We claim that natural scenes contain nonlinear dependencies that are disturbed by the compression process, and that this disturbance can be quantified and related to human perceptions of quality. We train and test our algorithm with data from human subjects, and show that reasonably comprehensive NSS models can help us in making blind, but accurate, predictions of quality. Our algorithm performs close to the limit imposed on useful prediction by the variability between human subjects.
We develop a framework for assessing the quality of stereoscopic images that have been afflicted by possibly asymmetric distortions. An intermediate image is generated which when viewed stereoscopically is designed to have a perceived quality close to that of the cyclopean image. We hypothesize that performing stereoscopic QA on the intermediate image yields higher correlations with human subjective judgments. The experimental results confirm the hypothesis and show that the proposed framework significantly outperforms conventional 2D QA metrics when predicting the quality of stereoscopically viewed images that may have been asymmetrically distorted.
We develop a no-reference binocular image quality assessment model that operates on static stereoscopic images. The model deploys 2D and 3D features extracted from stereopairs to assess the perceptual quality they present when viewed stereoscopically. Both symmetric- and asymmetric-distorted stereopairs are handled by accounting for binocular rivalry using a classic linear rivalry model. The NSS features are used to train a support vector machine model to predict the quality of a tested stereopair. The model is tested on the LIVE 3D Image Quality Database, which includes both symmetric- and asymmetric-distorted stereoscopic 3D images. The experimental results show that our proposed model significantly outperforms the conventional 2D full-reference QA algorithms applied to stereopairs, as well as the 3D full-reference IQA algorithms on asymmetrically distorted stereopairs.
How does the primate visual system encode three-dimensional motion? The macaque middle temporal area (MT) and the human MT complex (MT+) have well-established sensitivity to two-dimensional frontoparallel motion and static disparity. However, evidence for sensitivity to three-dimensional motion has remained elusive. We found that human MT+ encodes two binocular cues to three-dimensional motion: changing disparities over time and interocular comparisons of retinal velocities. By varying important properties of moving dot displays, we distinguished these three-dimensional motion signals from their constituents, instantaneous binocular disparity and monocular retinal motion. An adaptation experiment confirmed direction selectivity for three-dimensional motion. Our results indicate that MT+ carries critical binocular signals for three-dimensional motion processing, revealing an important and previously overlooked role for this well-studied brain area.
We introduce a novel framework for estimating visual sensitivity using a continuous target-tracking task in concert with a dynamic internal model of human visual performance. Observers used a mouse cursor to track the center of a two-dimensional Gaussian luminance blob as it moved in a random walk in a field of dynamic additive Gaussian luminance noise. To estimate visual sensitivity, we fit a Kalman filter model to the human tracking data under the assumption that humans behave as Bayesian ideal observers. Such observers optimally combine prior information with noisy observations to produce an estimate of target position at each time step. We found that estimates of human sensory noise obtained from the Kalman filter fit were highly correlated with traditional psychophysical measures of human sensitivity (R2 > 97%). Because each frame of the tracking task is effectively a "minitrial," this technique reduces the amount of time required to assess sensitivity compared with traditional psychophysics. Furthermore, because the task is fast, easy, and fun, it could be used to assess children, certain clinical patients, and other populations that may get impatient with traditional psychophysics. Importantly, the modeling framework provides estimates of decision variable variance that are directly comparable with those obtained from traditional psychophysics. Further, we show that easily computed summary statistics of the tracking data can also accurately predict relative sensitivity (i.e., traditional sensitivity to within a scale factor).
The ability to automatically detect visually interesting regions in images has many practical applications, especially in the design of active machine vision and automatic visual surveillance systems. Analysis of the statistics of image features at observers' gaze can provide insights into the mechanisms of fixation selection in humans. Using a foveated analysis framework, we studied the statistics of four low-level local image features: luminance, contrast, and bandpass outputs of both luminance and contrast, and discovered that image patches around human fixations had, on average, higher values of each of these features than image patches selected at random. Contrast-bandpass showed the greatest difference between human and random fixations, followed by luminance-bandpass, RMS contrast, and luminance. Using these measurements, we present a new algorithm that selects image regions as likely candidates for fixation. These regions are shown to correlate well with fixations recorded from human observers.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.