Learned Image Coding for Machines: A Content-Adaptive Approach

Le, Nam; Zhang, Honglei; Cricri, Francesco; Ghaznavi-Youvalari, Ramin; Tavakoli, Hamed R.; Rahtu, Esa

doi:10.1109/icme51207.2021.9428224

Cited by 35 publications

(20 citation statements)

References 12 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…2(b), which combines compression and machine vision analysis network structure and devises joint optimization strategies. Some methods [25]- [30], which are based on existing learned image compression frameworks, obtain the reconstructed image more appropriate for analysis through joint learning. However, in most cases, the quality of the image suffers.…”

Section: B Feature Codingmentioning

confidence: 99%

Slimmable Multi-Task Image Compression for Human and Machine Vision

et al. 2023

View full text Add to dashboard Cite

In the Internet of Things (IoT) communications, visual data is frequently processed among intelligent devices using artificial intelligence algorithms, replacing humans for analyzing and decision-making while only occasionally requiring human's scrutiny. However, due to high redundancy of compressive encoders, existing image coding solutions for machine vision are not efficient at runtime. To balance the rate-accuracy performance and efficiency of image compression for machine vision while attaining high-quality reconstructed images for human vision, this paper introduces a novel slimmable multi-task compression framework for human and machine vision in visual IoT applications. Firstly, the image compression for human and machine vision under the constraint of bandwidth, latency, computational resources are modelled as a multi-task optimization problem. Secondly, slimmable encoders are employed to multiple human and machine vision tasks in which the parameters of the sub-encoder for machine vision tasks are shared among all tasks and jointly learned. Thirdly, to solve the feature match between latent representation and intermediate features of deep vision networks, feature transformation networks are introduced as decoders of machine vision feature compression. Finally, the proposed framework is successfully applied to human and machine vision tasks' scenarios, e.g., object detection and image reconstruction. Experimental results show that the proposed method outperforms baselines and other image compression approaches on machine vision tasks with higher efficiency (shorter latency) in two vision tasks' scenarios while retaining comparable quality on image reconstruction.INDEX TERMS Image compression, feature compression, collaborative compression, intelligent analytics, machine vision.

show abstract

Section: B Feature Codingmentioning

confidence: 99%

Slimmable Multi-Task Image Compression for Human and Machine Vision

et al. 2023

View full text Add to dashboard Cite

show abstract

“…As new technologies for video applications (e.g., virtual reality, augmented reality, and point clouds) revolutionize the video coding industry, the heterogeneity and complexity of the captured data are presenting increasing challenges in the efficient compression of these data. Based on a review of the methods applied to date for video RC in the ML and DL domains, this paper argues that future ML and DL techniques can help to achieve smarter video coding [142]- [144].…”

Section: Future Workmentioning

confidence: 99%

Recent Advances in Rate Control: From Optimization to Implementation and Beyond

Wei¹,

Zhou²,

Wang³

et al. 2022

Preprint

View full text Add to dashboard Cite

Video coding is a video compression technique that compresses the original video sequence to achieve a smaller archive file or a lower transmission bandwidth under constraints on the visual quality loss. Rate control (RC) plays a critical role in video coding. It can achieve stable stream output in practical applications, especially in real-time video applications such as video conferencing or game live streaming. Most RC algorithms directly or indirectly characterize the relationship between the bit rate (R) and quantization (Q) and then allocate bits for every coding unit to guarantee the global bit rate and video quality level. This paper comprehensively reviews the classic RC technologies used in international video standards of past generations, analyses the mathematical models and implementation mechanisms of various schemes, and compares the performance of recent state-of-the-art RC algorithms. Finally, we discuss future directions and new application areas for RC methods. We hope that this review can help support the development, implementation, and application of RC in new video coding standards.

show abstract

“…These terms are defined in the same way as in [5], except for L task which is replaced by a proxy loss L proxy in our setup. Due to the correlations in the intermediate level features of different vision tasks [9], we can use an intermediate feature distortion metric as a proxy for L task , thus making the codec task-agnostic. Additionally, using a feature-based loss as such enables the training of the model with cropped images which is much more efficient.…”

Section: Baseline Image Codec Modelmentioning

confidence: 99%

“…Additionally, using a feature-based loss as such enables the training of the model with cropped images which is much more efficient. Similar to [9,18], we define L proxy as follows:…”

Section: Baseline Image Codec Modelmentioning

confidence: 99%

See 1 more Smart Citation

Bridging the Gap Between Image Coding for Machines and Humans

Zhang

Cricri

et al. 2022

2022 IEEE International Conference on Image Processing (ICIP)

Self Cite

View full text Add to dashboard Cite

Image coding for machines (ICM) aims at reducing the bitrate required to represent an image while minimizing the drop in machine vision analysis accuracy. In many use cases, such as surveillance, it is also important that the visual quality is not drastically deteriorated by the compression process. Recent works on using neural network (NN) based ICM codecs have shown significant coding gains against traditional methods; however, the decompressed images, especially at low bitrates, often contain checkerboard artifacts. We propose an effective decoder finetuning scheme based on adversarial training to significantly enhance the visual quality of ICM codecs, while preserving the machine analysis accuracy, without adding extra bitcost or parameters at the inference phase. The results show complete removal of the checkerboard artifacts at the negligible cost of −1.6% relative change in task performance score. In the cases where some amount of artifacts is tolerable, such as when machine consumption is the primary target, this technique can enhance both pixel-fidelity and feature-fidelity scores without losing task performance.

show abstract

Learned Image Coding for Machines: A Content-Adaptive Approach

Cited by 35 publications

References 12 publications

Slimmable Multi-Task Image Compression for Human and Machine Vision

Slimmable Multi-Task Image Compression for Human and Machine Vision

Recent Advances in Rate Control: From Optimization to Implementation and Beyond

Bridging the Gap Between Image Coding for Machines and Humans

Contact Info

Product

Resources

About