2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2021
DOI: 10.1109/iccv48922.2021.00510
|View full text |Cite
|
Sign up to set email alerts
|

MUSIQ: Multi-scale Image Quality Transformer

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
85
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
5
4

Relationship

2
7

Authors

Journals

citations
Cited by 231 publications
(118 citation statements)
references
References 33 publications
1
85
0
Order By: Relevance
“…As the input resolution increases, the performance improves, benefiting from its strong non-local capacity. Also, MaxViT shows better linear correlation compared to the SOTA method [41] which uses multi-resolution inputs.…”
Section: Image Aesthetic Assessmentmentioning
confidence: 96%
See 1 more Smart Citation
“…As the input resolution increases, the performance improves, benefiting from its strong non-local capacity. Also, MaxViT shows better linear correlation compared to the SOTA method [41] which uses multi-resolution inputs.…”
Section: Image Aesthetic Assessmentmentioning
confidence: 96%
“…Each image in the dataset has a histogram of scores associated with it, which we use as the ground truth label. Similar to [41,75], we split the dataset into train and test sets, such that 20% of the data is used for testing. We train MaxViT for three different input resolutions: 224 × 224, 384 × 384 and 512 × 512.…”
Section: B3 Image Aesthetics Assessmentmentioning
confidence: 99%
“…DBCNN [45] provided a dual bilinear network for NR-IQA. MUSIQ [46] also developed a transformer-based NR-IQA metric for the multi-scale information. Despite there are numerous NR-IQA methods with well-designed extractors and regressors, they almost neglect to investigate the special textural and structural degradation caused by image SR.…”
Section: B General Image Quality Assessmentmentioning
confidence: 99%
“…Then, they borrowed visual transformer's (ViT) architecture to extract the ResNet output features further. J. Ke directly applied the ViT module as the backbone for blind IQA [14]. They kept the image aspect ratio and used multiscale images as the input.…”
Section: A Blind Image Quality Assessmentmentioning
confidence: 99%