In this paper, we present a no-reference blur metric for images and video. The blur metric is based on the analysis of the spread of the edges in an image. Its perceptual significance is validated through subjective experiments. The novel metric is near real-time, has low computational complexity and is shown to perform well over a range of image content. Potential applications include optimization of source coding, network resource management and autofocus of an image capturing device.
We present a full-and no-reference blur metric as well as a full-reference ringing metric. These metrics are based on an analysis of the edges and adjacent regions in an image and have very low computational complexity. As blur and ringing are typical artifacts of wavelet compression, the metrics are then applied to JPEG2000 coded images. Their perceptual significance is corroborated through a number of subjective experiments. The results show that the proposed metrics perform well over a wide range of image content and distortion levels. Potential applications include source coding optimization and network resource management.
In this paper, we propose an efficient, robust, and fast method for the estimation of global motion from image sequences. The method is generic in that it can accommodate various global motion models, from a simple translation to an eight-parameter perspective model. The algorithm is hierarchical and consists of three stages. In the first stage, a low-pass image pyramid is built. Then, an initial translation is estimated with full-pixel precision at the top of the pyramid using a modified n-step search matching. In the third stage, a gradient descent is executed at each level of the pyramid starting from the initial translation at the coarsest level. Due to the coarse initial estimation and the hierarchical implementation, the method is very fast. To increase robustness to outliers, we replace the usual formulation based on a quadratic error criterion with a truncated quadratic function. We have applied the algorithm to various test sequences within an MPEG-4 coding system. From the experimental results we conclude that global motion estimation provides significant performance gains for video material with camera zoom and/or pan. The gains result from a reduced prediction error and a more compact representation of motion. We also conclude that the robust error criterion can introduce additional performance gains without increasing computational complexity.
The key to high performance in image sequence coding lies in an efficient reduction of the temporal redundancies. For this purpose, motion estimation and compensation techniques have been suc cessfully applied. This paper studies motion estimation algorithms in the context of fi rst generation coding techniques commonly used in digital TV. In this framework, estimating the motion in the scene is not an intrinsic goal. Motion estimation should indeed provide good temporal prediction and simultaneously require low overhead information. More specifically, the aim is to minimize globally the bandwidth corresponding to both the prediction error information and the motion parameters. This paper first clarifies the notion of motion, reviews classical motion estimation tech niques, and outlines new perspectives. Block matching techniques are shown to be the most appropriate in the framework of first generation coding. To overcome the drawbacks characteristic of most block matching techniques, this paper proposes a new locally adaptive multigrid block matching motion estimation technique.This algorithm has been designed taking into account the above aims. It leads to a robust motion field estimation, precise prediction along moving edges and a decreased amount of side information in uniform areas. Furthermore, the algorithm controls the accuracy of the motion estimation procedure in order to optimally balance the amount of information corresponding to the prediction error and to the motion parameters. Experimental results show that the technique results in greatly enhanced visual quality and significant saving in terms of bit rate when compared to classical block matching techniques.
Abstract-In this paper, we address the problem of privacy protection in video surveillance. We introduce two efficient approaches to conceal regions of interest (ROIs) based on transform-domain or codestream-domain scrambling. In the first technique, the sign of selected transform coefficients is pseudorandomly flipped during encoding. In the second method, some bits of the codestream are pseudorandomly inverted. We address more specifically the cases of MPEG-4 as it is today the prevailing standard in video surveillance equipment. Simulations show that both techniques successfully hide private data in ROIs while the scene remains comprehensible. Additionally, the amount of noise introduced by the scrambling process can be adjusted. Finally, the impact on coding efficiency performance is small, and the required computational complexity is negligible.
A computationally fast tone mapping operator (TMO) that can quickly adapt to a wide spectrum of high dynamic range (HDR) content is quintessential for visualization on varied low dynamic range (LDR) output devices such as movie screens or standard displays. Existing TMOs can successfully tone-map only a limited number of HDR content and require an extensive parameter tuning to yield the best subjective-quality tone-mapped output. In this paper, we address this problem by proposing a fast, parameter-free and scene-adaptable deep tone mapping operator (DeepTMO) that yields a high-resolution and high-subjective quality tone mapped output. Based on conditional generative adversarial network (cGAN), DeepTMO not only learns to adapt to vast scenic-content (e.g., outdoor, indoor, human, structures, etc.) but also tackles the HDR related scenespecific challenges such as contrast and brightness, while preserving the fine-grained details. We explore 4 possible combinations of Generator-Discriminator architectural designs to specifically address some prominent issues in HDR related deep-learning frameworks like blurring, tiling patterns and saturation artifacts. By exploring different influences of scales, loss-functions and normalization layers under a cGAN setting, we conclude with adopting a multi-scale model for our task. To further leverage on the large-scale availability of unlabeled HDR data, we train our network by generating targets using an objective HDR quality metric, namely Tone Mapping Image Quality Index (TMQI). We demonstrate results both quantitatively and qualitatively, and showcase that our DeepTMO generates high-resolution, high-quality output images over a large spectrum of real-world scenes. Finally, we evaluate the perceived quality of our results by conducting a pair-wise subjective study which confirms the versatility of our method.
Efficient point cloud compression is fundamental to enable the deployment of virtual and mixed reality applications, since the number of points to code can range in the order of millions. In this paper, we present a novel data-driven geometry compression method for static point clouds based on learned convolutional transforms and uniform quantization. We perform joint optimization of both rate and distortion using a trade-off parameter. In addition, we cast the decoding process as a binary classification of the point cloud occupancy map. Our method outperforms the MPEG reference solution in terms of rate-distortion on the Microsoft Voxelized Upper Bodies dataset with 51.5% BDBR savings on average. Moreover, while octree-based methods face exponential diminution of the number of points at low bitrates, our method still produces high resolution outputs even at low bitrates. Code and supplementary material are available at https: //github.com/mauriceqch/pcc_geo_cnn.
In this paper we describe a database containing subjective assessment scores relative to 78 video streams encoded with H.264/AVC and corrupted by simulating the transmission over error-prone network. The data has been collected from 40 subjects at the premises of two academic institutions. Our goal is to provide a balanced and comprehensive database to enable reproducible research results in the field of video quality assessment. In order to support research works on Full-Reference, Reduced-Reference and No-Reference video quality assessment algorithms, both the uncompressed files and the H.264/AVC bitstreams of each video sequence have been made publicly available for the research community, together with the subjective results of the performed evaluations.Index Terms-Subjective video quality assessment, packet loss rate, H.264/AVC, error resilience.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.