In the Machine-to-Machine (M2M) transmission context, there is a great need to reduce the amount of transmitted information using lossy compression. However, commonly used image compression methods are designed for human perception, not for Artificial Intelligence (AI) algorithms performances. It is known that these compression distortions affect many deep learning based architectures on several computer vision tasks. In this paper, we focus on the classification task and propose a new approach, named expert training, to enhance Convolutional Neural Networks (CNNs) resilience to compression distortions. We validated our approach using MnasNet and ResNet50 architectures, against image compression distortions introduced by three commonly used methods (JPEG, J2K and BPG), on the ImageNet dataset. The results showed a better robustness of these two architectures against the tested coding artifacts using the proposed expert training approach. Our code is publicly available at https://github.com/albmarie/expert training.
In the Video Coding for Machines (VCM) context where visual content is compressed before being transmitted to a vision task algorithm, appropriate trade-off between the compression level and the vision task performance must be chosen. In this paper, a Deep Neural Networks (DNN) based semantic segmentation algorithm robustness to compression artifacts is evaluated with a total of 1486 different coding configurations. Results indicate the importance of using an appropriate image resolution to overcome the block-partitioning limitations in existing compression algorithms, allowing 58.3%, 49.8%, 33.5% and 24.3% bitrate savings at equivalent prediction accuracy for JPEG, JM, x265 and VVenC, respectively. Surprisingly, JPEG can achieve 73.41% bitrate reduction with the inclusion of compressed images at training time over VVC Test Model (VTM) with a DNN trained on pristine data, which implies that DNN generalization ability must not be overlooked.
On-the-sphere compression of omnidirectional videos is a very promising approach. First, it saves computational complexity as it avoids to project the sphere onto a 2D map, as classically done. Second, and more importantly, it allows to achieve a better rate-distortion tradeoff, since neither the visual data nor its domain of definition are distorted. In this paper, the on-the-sphere compression [1] for omnidirectional still images is extended to videos. We first propose a complete review of existing spherical motion models. Then we propose a new one called tangent-linear+t. We finally propose a rate-distortion optimized algorithm to locally choose the best motion model for efficient motion estimation/compensation. For that purpose, we additionally propose a finer search pattern, called spherical-uniform, for the motion parameters, which leads to a more accurate block prediction. The novel algorithm leads to rate-distortion gains compared to methods based on a unique motion model.
Image and video compression aims at finding an optimal trade-off between rate and distortion. This is done through Rate-Distortion Optimization (RDO) in traditional encoders with the use of Image Quality Assessment (IQA) metrics. While it is known that most IQA metrics are designed to be correlated with human perception, there is no evidence that this observation can be generalized in a Video Coding for Machines (VCM) context, where the receiver is not a human anymore but a machine. In this paper, we propose an evaluation protocol to measure the correlation level between conventional Full-Reference (FR) IQA metrics and machine perception through the semantic segmentation vision task. Experiments showed a relatively low correlation between them when measured on the block-level. This observation implies the need of RDO algorithms that are better suited for Machine-to-Machine (M2M) communications. In order to facilitate the emergence of IQA metrics that better reflect machine perception, the code and dataset used to perform this study is made freely available at https://github.com/albmarie/iqa m2m segmentation.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.