Subjective assessment methods have been used reliably for many years to evaluate video quality. They continue to provide the most reliable assessments compared to objective methods. Some issues that arise with subjective assessment include the cost of conducting the evaluations and the fact that these methods cannot easily be used to monitor video quality in real time. Furthermore, traditional, analog objective methods, while still necessary, are not sufficient to measure the quality of digitally compressed video systems. Thus, there is a need to develop new objective methods utilizing the characteristics of the human visual system. While several new objective methods have been developed, there is to date no internationally standardized method.The Video Quality Experts Group (VQEG) was formed in October 1997 to address video quality issues. The group is composed of e xperts from various backgrounds and affiliations, including participants from several internationally recognized organizations working in the field of video quality assessment. The majority of participants are active in the International Telecommunications Union (ITU) and VQEG combines the expertise and resources found in several ITU Study Groups to work towards a common goal. The first task undertaken by VQEG was to provide a validation of objective video quality measurement methods leading to Recommendations in both the Telecommunications (ITU-T) and Radiocommunication (ITU-R) sectors of the ITU. To this end, VQEG designed and executed a test program to compare subjective video quality evaluations to the predictions of a number of proposed objective measurement methods for video quality in the bit rate range of 768 kb/s to 50 Mb/s. The results of this test show that there is no objective measurement system that is currently able to replace subjective testing. Depending on the metric used for evaluation, the performance of eight or nine models was found to be statistically equivalent, leading to the conclusion that no single model outperforms the others in all cases. The greatest achievement of this first validation effort is the unique data set assembled to help future development of objective models.
Many models of visual performance predict image discriminability, the visibility of the difference between a pair of images. We compared the ability of three image discrimination models to predict the detectability of objects embedded in natural backgrounds. The three models were: a multiple channel Cortex transform model with within-channel masking; a single channel contrast sensitivity filter model; and a digital image difference metric. Each model used a Minkowski distance metric (generalized vector magnitude) to summate absolute differences between the background and object plus background images. For each model, this summation was implemented with three different exponents: 2, 4 and infinity. In addition, each combination of model and summation exponent was implemented with and without a simple contrast gain factor. The model outputs were compared to measures of object detectability obtained from 19 observers. Among the models without the contrast gain factor, the multiple channel model with a summation exponent of 4 performed best, predicting the pattern of observer d's with an RMS error of 2.3 dB. The contrast gain factor improved the predictions of all three models for all three exponents. With the factor, the best exponent was 4 for all three models, and their prediction errors were near 1 dB. These results demonstrate that image discrimination models can predict the relative detectability of objects in natural scenes.
Studies of visual attention and eye movements have shown that people generally attend to only a few areas in typical scenes. These areas are commonly referred to as regions of interest (ROTs). When scenes are viewed with the same context and motivation (e.g., typical entertainment scenario), these ROTs are often highly correlated amongst different people, motivating the development of computational models of visual attention. This paper describes a novel model of visual attention designed to provide an accurate and robust prediction of a viewer's locus of attention across a wide range of typical video content. The model has been calibrated and verified using data gathered in an experiment in which the eye movements of 24 viewers were recorded while viewing material from a large database of still (130 images) and video (--13 minutes) scenes. Certain characteristics of the scene content, such as moving objects, people, foreground and centrally-located objects, were found to exert a strong influence on viewers' attention. The results of comparing model predictions to experimental data demonstrate a strong correlation between the predicted ROTs and viewers' fixations.
To determine whether a parabolic template is a good description of the contrast-sensitivity functions (CSF's) exhibited by older adults, the curve-fitting method of Pelli et al. [J. Opt. Soc. Am. A 3(13), P56 (1986)] was applied to contrast-sensitivity data from 100 older subjects (ages 53-85 years). Although the method resulted in reasonable fits for most subjects, closer inspection revealed that this technique may be problematic. A significant number of observers had functions that were nonparabolic, and for many subjects the error tended to be concentrated at the peak of the CSF. In addition, in contrast to the study of Pelli et al., the peak contrast sensitivities of the subjects were only weakly related to Pelli-Robson contrast sensitivity and letter acuity. The data were also fitted with an asymmetric function of variable shape. Whereas this function provided a better fit to the nonparabolic CSF's, it resulted in inferior fits to most of the remaining data. These results demonstrate that the spatial CSF's of older adults cannot be described by a single parametric curve such as a parabola or a function of an exponential and that Pelli-Robson contrast sensitivity and letter acuity are not adequate predictors of their peak contrast sensitivities.
The contrast dependence of perceived depth was quantified through a series of depth matching experiments. Perceived depth was found to be a power law function of contrast. In addition, subjects exhibited a large uncrossed depth bias indicating that low contrast test patterns appeared much farther away than high contrast patterns of equal disparity. For disparities in the range of +/- 4.0 arc min, matching disparities for low contrast patterns were shifted in the uncrossed direction by the same amount. In other words, while the magnitude of the uncrossed depth bias is a power law function of contrast, it is constant with respect to disparity. In a second series of experiments, the contrast dependence of stereo increment thresholds was measured. Like perceived depth and stereoacuity, stereo increment thresholds were found to be a power law function of contrast. These results suggest that contrast effects occur at or before the extraction of depth and have implications for the response properties of disparity-selective mechanisms.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.