Abstract-High definition video over IP based networks (IPTV) has become a mainstay in today's consumer environment. In most applications, encoders conforming to the H.264/AVC standard are used. But even within one standard, often a wide range of coding tools are available that can deliver a vastly different visual quality. Therefore we evaluate in this contribution different coding technologies, using different encoder settings of H.264/AVC, but also a completely different encoder like Dirac. We cover a wide range of different bitrates from ADSL to VDSL and different content, with low and high demand on the encoders. As PSNR is not well suited to describe the perceived visual quality, we conducted extensive subject tests to determine the visual quality. Our results show that for currently common bitrates, the visual quality can be more than doubled, if the same coding technology, but different coding tools are used.
Abstract-Quality of Experience is becoming increasingly important in signal processing applications. In taking inspiration from chemometrics, we provide an introduction to the design of video quality metrics by using data analysis methods, which are different from traditional approaches. These methods do not necessitate a complete understanding of the human visual system. We use multidimensional data analysis, an extension of well established data analysis techniques, allowing us to exploit higher dimensional data better. In the case of video quality metrics, it enables us to exploit the temporal properties of video more properly, the complete three dimensional structure of the video cube is taken into account in metrics' design. Starting with the well known principal component analysis and an introduction to the notation of multi-way arrays, we then present their multidimensional extensions, delivering better quality prediction results. Although we focus on video quality, the presented design principles can easily be adapted to other modalities and to even higher dimensional datasets as well. Q UALITY OF EXPERIENCE (QoE) is a relatively new concept in signal processing that aims to describe how video, audio and multi-modal stimuli are perceived by human observers. In the field of video quality assessment, it is often of interest for researchers how the overall experience is influenced by different video coding technologies, transmission errors or general viewing conditions. The focus is no longer on measurable physical quantities, but rather on how the stimuli are subjectively experienced and whether they are perceived to be of acceptable quality from a subjective point of view.QoE is in contrast to the well-established Quality of Service (QoS). There, we measure the signal fidelity, i.e. how much a signal is degraded during processing by noise or other disturbances. This is usually done by comparing the distorted with the original signal, which then gives us a measure of the signal's quality. To understand the reason why QoS is not sufficient for capturing the subjective perception of quality, let us take a quick look at the most popular metric in signal processing to measure the QoS, the mean squared error (MSE). It is known that the MSE does not correlate very well with the human perception of quality, as we just determine the difference between pixel values in both images. The example in Fig. 1 illustrates this problem. Both images on the left have the same MSE with respect to the original image. Yet, we perceive the upper image distorted by coding artefacts to be of worse visual quality, than the lower image, where we just changed the contrast slightly. Further discussions of this problem can be found in [1]. I. HOW TO MEASURE QUALITY OF EXPERIENCEHow then can we measure QoE? The most direct way is to conduct tests with human observers, who judge the visual quality of video material and provide thus information about the subjectively perceived quality. However, we face a problem in real-life: these tests a...
The well-established static head-related transfer function (HRTF) measurement approaches using maximum length sequences and sine sweeps are compared with a recent HRTF estimation approach using normalized least mean square adaptive filters, which enables a continuous movement of the person to be measured during the recording of the excitation signal. By using continuous movement HRTF measurement, a huge amount of time for the individual HRTF estimation can be saved to create a dense HRTF database for headphone-based sound synthesis or applications such as crosstalk cancellation for loudspeakerbased sound synthesis. The different approaches are implemented and experimentally compared by objective and subjective evaluation.
Sound source localization algorithms determine the physical position of a sound source in respect to a listener. For practical applications, a localization algorithm design has to take into account real world conditions like multiple active sources, reverberation, and noise. The application can impose additional constraints on the algorithm, e.g., a requirement for low latency. This work defines the most important constraints for practical applications, introduces an algorithm, which tries to fulfill all requirements as good as possible, and compares it to state-of-the-art sound source localization approaches.
The aim of any video quality metric is to deliver a quality prediction similar to the video quality perceived by human observers. One way to design such a model of human perception is by data analysis. In this contribution we intend to extend this approach to the temporal dimension. Even though video obviously consists of spatial and temporal dimensions, the temporal aspect is often not considered well enough. Instead of including this third dimension in the model itself, the metrics are usually only applied on a frame-by-frame basis and then temporally pooled, commonly by averaging. Therefore we propose to skip the temporal pooling step and use the additional temporal dimension in the model building step of the video quality metric. We propose to use the two dimensional extension of the PCR, the 2D-PCR, in order to obtain an improved model. We conducted extensive subjective tests with different HDTV video sequences at 1920 × 1080 and 25 frames per seconds. For verification, we performed a cross validation to get a measure for the real-life performance of the acquired model. Finally, we will show that the direct inclusion of the temporal dimension of video into the model building improves the overall prediction accuracy of the visual quality significantly.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.