Recognizable or Not: Towards Image Semantic Quality Assessment for Compression

Liu, Dong; Wang, Dandan; Li, Houqiang

doi:10.1007/s11220-016-0152-5

Cited by 25 publications

(8 citation statements)

References 32 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Moreover, there are coding schemes that serve for automatic semantic analysis instead of human viewing, such as surveillance video coding. For these schemes, the quality metric shall be semantic quality [74], which remains largely unexplored. As a special note, we find that there is a tradeoff between signal fidelity, perceptual naturalness, and semantic quality [75], which implies that the optimization target shall be aligned with the actual requirement.…”

Section: Perspectives and Conclusionmentioning

confidence: 99%

Deep Learning-Based Video Coding: A Review and A Case Study

Liu,

Li,

Lin

et al. 2019

Preprint

Self Cite

View full text Add to dashboard Cite

The past decade has witnessed great success of deep learning technology in many disciplines, especially in computer vision and image processing. However, deep learning-based video coding remains in its infancy. This paper reviews the representative works about using deep learning for image/video coding, which has been an actively developing research area since the year of 2015. We divide the related works into two categories: new coding schemes that are built primarily upon deep networks (deep schemes), and deep network-based coding tools (deep tools) that shall be used within traditional coding schemes or together with traditional coding tools. For deep schemes, pixel probability modeling and auto-encoder are the two approaches, that can be viewed as predictive coding scheme and transform coding scheme, respectively. For deep tools, there have been several proposed techniques using deep learning to perform intra-picture prediction, inter-picture prediction, cross-channel prediction, probability distribution prediction, transform, post-or in-loop filtering, down-and up-sampling, as well as encoding optimizations. According to the newest reports, deep schemes have achieved comparable or even higher compression efficiency than the stateof-the-art traditional schemes, such as High Efficiency Video Coding (HEVC) based scheme, for image coding; deep tools have demonstrated the compression capability beyond HEVC for video coding. However, deep schemes have not yet reached the current height of HEVC for video coding, and deep tools remain largely unexplored at many aspects including the tradeoff between compression efficiency and encoding/decoding complexity, the optimization for perceptual naturalness or semantic quality, the speciality and universality, the federated design of multiple deep tools, and so on. In the hope of advocating the research of deep learning-based video coding, we present a case study of our developed prototype video codec, namely Deep Learning Video Coding (DLVC). DLVC features two deep tools that are both based on convolutional neural network (CNN), namely CNN-based in-loop filter (CNN-ILF) and CNN-based block adaptive resolution coding (CNN-BARC). Both tools help improve the compression efficiency by a significant margin. With the two deep tools as well as other non-deep coding tools, DLVC is able to achieve on average 39.6% and 33.0% bits saving than HEVC, under random-access and low-delay configurations, respectively.The source code of DLVC has been released for future researches.

show abstract

Section: Perspectives and Conclusionmentioning

confidence: 99%

Deep Learning-Based Video Coding: A Review and A Case Study

Liu,

Li,

Lin

et al. 2019

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…1) Full-Reference Perceptual Loss Comparison on SOTS: Since dehazed images are often subsequently fed for automatic semantic analysis tasks such as recognition and detection, we argue that the optimization target of dehazing in these tasks is neither pixel-level or perceptual-level quality, but the utility of the dehazed images in the given semantic analysis task [59]. The perceptual loss [32] was proposed to measure the semantic-level similarity of images, using the VGG recognition model 4 pre-trained on ImageNet dataset [60].…”

Section: B Restoration Versus High-level Visionmentioning

confidence: 99%

Benchmarking Single Image Dehazing and Beyond

Ren

et al. 2017

Preprint

View full text Add to dashboard Cite

We present a comprehensive study and evaluation of existing single image dehazing algorithms, using a new largescale benchmark consisting of both synthetic and real-world hazy images, called REalistic Single Image DEhazing (RESIDE). RESIDE highlights diverse data sources and image contents, and is divided into five subsets, each serving different training or evaluation purposes. We further provide a rich variety of criteria for dehazing algorithm evaluation, ranging from full-reference metrics, to no-reference metrics, to subjective evaluation and the novel task-driven evaluation. Experiments on RESIDE shed light on the comparisons and limitations of state-of-the-art dehazing algorithms, and suggest promising future directions.

show abstract

“…While it's a contradictory that most complicated quality metrics with high performance are not able to be integrated easily into an image compression loop. Some research works tried to do this by adjusting image compression parameters (e.g., Quantization parameters) heuristically according to embedded quality metrics [5,6], but they are still not fully automatic-optimized end-to-end image encoder with integration of complicated distortion metrics.…”

Section: Introductionmentioning

confidence: 99%

“…https://github.com/tensorflow/models/tree/master/research/compression6 A fixed header size of 100 bytes in JPEG2000 is added for all results.…”

mentioning

confidence: 99%

Learning based Facial Image Compression with semantic fidelity metric

Chen

2019

Neurocomputing

View full text Add to dashboard Cite

Surveillance and security scenarios usually require high efficient facial image compression scheme for face recognition and identification. While either traditional general image codecs or special facial image compression schemes only heuristically refine codec separately according to face verification accuracy metric. We propose a Learning based Facial Image Compression (LFIC) framework with a novel Regionally Adaptive Pooling (RAP) module whose parameters can be automatically optimized according to gradient feedback from an integrated hybrid semantic fidelity metric, including a successfully exploration to apply Generative Adversarial Network (GAN) as metric directly in image compression scheme. The experimental results verify the framework's efficiency by demonstrating performance improvement of 71.41%, 48.28% and 52.67% bitrate saving separately over JPEG2000, WebP and neural network-based codecs under the same face verification accuracy distortion metric. We also evaluate LFIC's superior performance gain compared with latest specific facial image codecs. Visual experiments also show some interesting insight on how LFIC can automatically capture the information in critical areas based on semantic distortion metrics for optimized compression, which is quite different from the heuristic way of optimization in traditional image compression algorithms.

show abstract

Recognizable or Not: Towards Image Semantic Quality Assessment for Compression

Cited by 25 publications

References 32 publications

Deep Learning-Based Video Coding: A Review and A Case Study

Deep Learning-Based Video Coding: A Review and A Case Study

Benchmarking Single Image Dehazing and Beyond

Learning based Facial Image Compression with semantic fidelity metric

Contact Info

Product

Resources

About