The platform will undergo maintenance on Sep 14 at about 7:45 AM EST and will be unavailable for approximately 2 hours.
2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2021
DOI: 10.1109/iccv48922.2021.01513
|View full text |Cite
|
Sign up to set email alerts
|

Rethinking 360° Image Visual Attention Modelling with Unsupervised Learning

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
5
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
4

Relationship

1
8

Authors

Journals

citations
Cited by 25 publications
(5 citation statements)
references
References 48 publications
0
5
0
Order By: Relevance
“…Self-supervision has become the new norm for learning representations given its ability to exploit unlabelled data [59,23,15,2,5,81,4,9,60,39,14]. Recent approaches devised for video understanding can be divided into two categories based on the SSL objective, namely pretext task based and contrastive learning based.…”
Section: Ssl For Video Representation Learningmentioning
confidence: 99%
“…Self-supervision has become the new norm for learning representations given its ability to exploit unlabelled data [59,23,15,2,5,81,4,9,60,39,14]. Recent approaches devised for video understanding can be divided into two categories based on the SSL objective, namely pretext task based and contrastive learning based.…”
Section: Ssl For Video Representation Learningmentioning
confidence: 99%
“…In Table 1, we present the results of our SalViT360 model and the existing models. We evaluate the performance of SalViT360 with six state-of-the-art models for 360 • image and video saliency prediction, namely CP-360 [19], SalGAN360 [17], MV-SalGAN360 [20], Djilali et al [25], ATSal [21], PAVER [24]. ATSal, PAVER, and CP-360 are video saliency models; the rest are image-based models developed for the omnidirectional domain.…”
Section: Comparison With the State-of-the-artmentioning
confidence: 99%
“…Yun et al [24] use local undistorted patches with deformable CNNs and use a ViT variant for self-attention across space and time. Djilali et al [25] used a self-supervised pre-training based on learning the association between several different views of the same scene and trained a supervised decoder for 360 • saliency prediction as a downstream task. Although their approach considers the global relationship between viewports, it ignores the temporal dimension that is crucial for video understanding.…”
Section: Introductionmentioning
confidence: 99%
“…2 Related Works 2.1 Self-Supervised Representation Learning SSL has recently matched the performance of supervised learning on several computer vision benchmarks [Chen et al, 2020, Djilali et al, 2021, Bachman et al, 2019, Grill et al, 2020. Contrastive Learning.…”
Section: Introductionmentioning
confidence: 99%