Norm-in-Norm Loss with Faster Convergence and Better Performance for Image Quality Assessment

Video quality assessment (VQA) is an important problem in computer vision. The videos in computer vision applications are usually captured in the wild. We focus on automatically assessing the quality of in-the-wild videos, which is a challenging problem due to the absence of reference videos, the complexity of distortions, and the diversity of video contents. Moreover, the video contents and distortions among existing datasets are quite different, which leads to poor performance of datadriven methods in the cross-dataset evaluation setting. To improve the performance of quality assessment models, we borrow intuitions from human perception, specifically, content dependency and temporal-memory effects of human visual system. To face the cross-dataset evaluation challenge, we explore a mixed datasets training strategy for training a single VQA model with multiple datasets. The proposed unified framework explicitly includes three stages: relative quality assessor, nonlinear mapping, and dataset-specific perceptual scale alignment, to jointly predict relative quality, perceptual quality, and subjective quality. Experiments are conducted on four publicly available datasets for VQA in the wild, i.e., LIVE-VQC, LIVE-Qualcomm, KoNViD-1k, and CVD2014. The experimental results verify the effectiveness of the mixed datasets training strategy and prove the superior performance of the unified model in comparison with the state-of-the-art models. For reproducible research, we make the PyTorch implementation of our method available at https://github.com/lidq92/MDTVSFA. Keywords Content dependency • In-the-wild videos • Mixed datasets training • Temporal-memory effect • Video quality assessment Communicated by Mei Chen.

show abstract

“…whereQ (d) d,i . Note that PLCC-induced loss is also considered in Ma et al (2018), Liu et al (2018) and Li et al (2020).…”

Section: Linearity-induced Lossmentioning

confidence: 99%

Unified Quality Assessment of in-the-Wild Videos with Mixed Datasets Training

Jiang

2021

Int J Comput Vis

Self Cite

104

View full text Add to dashboard Cite

show abstract

“…For NR-IQA, CNN-based methods (Bosse et al 2017;Wu et al 2020;Su et al 2020) have significantly outperformed handcrafted statistic-based approaches (Xu et al 2016) by directly extracting discriminative features from LQ images. Due to distortion diversity and content changes, the recent trend of NR-IQA (Li, Jiang, and Jiang 2020) is to involve semantic prior information by using pretrained models on classification databases, i.e., ImageNet (Deng et al 2009). And Su et al (Su et al 2020) Note that the FR-teacher is pretrained and fixed only for distillation and the trained NAR-student is applied for testing.…”

Section: Related Workmentioning

confidence: 99%

Content-Variant Reference Image Quality Assessment via Knowledge Distillation

Yin¹,

Wang²,

Yuan³

et al. 2022

Preprint

View full text Add to dashboard Cite

Generally, humans are more skilled at perceiving differences between high-quality (HQ) and low-quality (LQ) images than directly judging the quality of a single LQ image. This situation also applies to image quality assessment (IQA). Although recent no-reference (NR-IQA) methods have made great progress to predict image quality free from the reference image, they still have the potential to achieve better performance since HQ image information is not fully exploited. In contrast, full-reference (FR-IQA) methods tend to provide more reliable quality evaluation, but its practicability is affected by the requirement for pixel-level aligned reference images. To address this, we firstly propose the content-variant reference method via knowledge distillation (CVRKD-IQA). Specifically, we use non-aligned reference (NAR) images to introduce various prior distributions of high-quality images. The comparisons of distribution differences between HQ and LQ images can help our model better assess the image quality. Further, the knowledge distillation transfers more HQ-LQ distribution difference information from the FR-teacher to the NAR-student and stabilizing CVRKD-IQA performance. Moreover, to fully mine the local-global combined information, while achieving faster inference speed, our model directly processes multiple image patches from the input with the MLP-mixer. Cross-dataset experiments verify that our model can outperform all NAR/NR-IQA SOTAs, even reach comparable performance with FR-IQA methods on some occasions. Since the content-variant and non-aligned reference HQ images are easy to obtain, our model can support more IQA applications with its relative robustness to content variations. Our code and more detail elaborations of supplement are available: https://github.com/guanghaoyin/CVRKD-IQA.

show abstract

“…Three different fitness functions are considered for regression, namely the smooth-L1, the norm-in-norm [ 30 ], and the ranking hinge loss. The smooth-L1 loss is widely used for regression tasks because of its robustness to outliers.…”

Section: Facial Image Aesthetic Estimationmentioning

confidence: 99%

“…The recent norm-in-norm loss [ 30 ] facilitates faster convergence for training a CNN based (Image Quality Assessment) IQA model and also leads to better prediction performance than the mean absolute error (MAE) and mean squared error (MSE) losses. Its estimation is based on three steps: the computation of statistics, normalization based on the statistics, and loss as the norm of the differences between normalized values.…”

Section: Facial Image Aesthetic Estimationmentioning

confidence: 99%

See 1 more Smart Citation

A Genetic Algorithm to Combine Deep Features for the Aesthetic Assessment of Images Containing Faces

Celona

Schettini

2021

Sensors

View full text Add to dashboard Cite

The automatic assessment of the aesthetic quality of a photo is a challenging and extensively studied problem. Most of the existing works focus on the aesthetic quality assessment of photos regardless of the depicted subject and mainly use features extracted from the entire image. It has been observed that the performance of generic content aesthetic assessment methods significantly decreases when it comes to images depicting faces. This paper introduces a method for evaluating the aesthetic quality of images with faces by encoding both the properties of the entire image and specific aspects of the face. Three different convolutional neural networks are exploited to encode information regarding perceptual quality, global image aesthetics, and facial attributes; then, a model is trained to combine these features to explicitly predict the aesthetics of images containing faces. Experimental results show that our approach outperforms existing methods for both binary, i.e., low/high, and continuous aesthetic score prediction on four different image databases in the state-of-the-art.

show abstract

Norm-in-Norm Loss with Faster Convergence and Better Performance for Image Quality Assessment

Cited by 62 publications

References 29 publications

Unified Quality Assessment of in-the-Wild Videos with Mixed Datasets Training

Unified Quality Assessment of in-the-Wild Videos with Mixed Datasets Training

Content-Variant Reference Image Quality Assessment via Knowledge Distillation

A Genetic Algorithm to Combine Deep Features for the Aesthetic Assessment of Images Containing Faces

Contact Info

Product

Resources

About